Improving emergency and admission care in low-resource, high mortality hospital settings—not as easy as A, B and C

The number and quality of clinical trials examining specific alternative disease treatments (A vs B) in low-resource settings (LRSs) has increased because of dedicated funding streams. Rigorous evaluations of innovations or improvements in service delivery such as that now reported by Hategaka et al are far fewer (Hategeka et al., 2021). In this example the intervention (an ETAT+ package (Irimu et al., 2008)) aimed to promote use of multiple recommended practices to improve quality of hospital care in Rwanda. These included changing the way individual health-workers and teams organize their practice (e.g. care guided by the A, B and C’s for triage and resuscitation) and correctly identifying and treating children and newborns with specific conditions (e.g. diagnosis and management of severe dehydration or sepsis) (Hategeka et al., 2021). In many ways improving service delivery seems selfevidently a good thing. Who, anywhere in the world, would not wish their sick child or newborn to have access to a hospital where health workers have the systems and skills to handle serious illness effectively? So, what should we make of a controlled interrupted time-series study of a multi-faceted intervention to achieve this in Rwanda that had no statistically significant impact on all-cause neonatal and paediatric hospital mortality, although a possible impact on the case-fatality of targeted neonatal conditions? (Hategeka et al., 2021). For some, the answer will be simple. Strictly, this study provides insufficient evidence that the intervention works well enough, so why use it? For others, providing evidence on what improves hospital care and outcomes is more complicated. For yet others, the answer is well, complex. Making things complicated are the number and range of issues that make designing, conducting and interpreting multi-faceted interventions targeting clinical care in hospitals problematic. I reflect briefly on just three; how do interventions work, achieving control in comparative studies and choosing the right measures of success. Firstly, we ought to be able to articulate how our intervention should achieve our desired outcomes. Logic models, theories of change and directed acyclic graphs may help and may be supported by other approaches for understanding implementation (Michie et al., 2011; Powell et al., 2015; De Silva et al., 2014). This conceptual thinking should identify key intermediary clinical practice changes needed to produce mortality effects if this is their goal. Ideally prospectively developed models then drive specific data collection on important intermediary effects. For example, in studies testing interventions to improve triage, emergency and immediate admission care impacts on mortality seem predicated on improving the speed and accuracy with which sick children are seen, assessed and managed. So, do such intermediary outcomes change? Here, we need to distinguish evaluating intermediate effects from fidelity of intervention delivery (Moore et al., 2015). The latter is primarily concerned with whether or howwell what is planned is done. Failure to implement may explain why both key intermediary effects and final outcomes are not achieved. Conversely, if implementation goes as planned but key intermediary effects are not seen how dowe interpret any change in our main outcome measure? Examining intermediary effects early may even call into question the value of proceeding to a multi-year mortality endpoint and limit research waste. Unfortunately, an overarching challenge with service delivery interventions and models of them are they can become hugely complicated. A recent typology included over 70 distinct implementation components that might be deployed (Powell et al., 2015). In one Kenyan hospital improvement project, over 20 were deployed simultaneously and evaluating their successful delivery and multiple intermediary effects can become an overwhelming task (English et al., 2017). Such complicated interventions also pose challenges for ‘controlled’ studies. Hategaka et al.’s study design is said to control for ‘differences in characteristics between intervention and control hospitals that remained constant or changed over time’ (Hategeka et al., 2021). When hospitals are the units of intervention the numbers involved in trials are typically few. Conversely, the number of factors that may act as confounders or introduce bias is extremely large, and many may be important but unmeasurable (e.g. clinical leadership) (English et al., 2011). Is it then reasonable to expect that two groups of hospitals are ‘balanced’ with respect to a huge array of largely unknown but influential factors? Furthermore, such

The number and quality of clinical trials examining specific alternative disease treatments (A vs B) in low-resource settings (LRSs) has increased because of dedicated funding streams. Rigorous evaluations of innovations or improvements in service delivery such as that now reported by Hategaka et al are far fewer (Hategeka et al., 2021). In this example the intervention (an ETAT+ package (Irimu et al., 2008)) aimed to promote use of multiple recommended practices to improve quality of hospital care in Rwanda. These included changing the way individual health-workers and teams organize their practice (e.g. care guided by the A, B and C's for triage and resuscitation) and correctly identifying and treating children and newborns with specific conditions (e.g. diagnosis and management of severe dehydration or sepsis) (Hategeka et al., 2021).
In many ways improving service delivery seems selfevidently a good thing. Who, anywhere in the world, would not wish their sick child or newborn to have access to a hospital where health workers have the systems and skills to handle serious illness effectively? So, what should we make of a controlled interrupted time-series study of a multi-faceted intervention to achieve this in Rwanda that had no statistically significant impact on all-cause neonatal and paediatric hospital mortality, although a possible impact on the case-fatality of targeted neonatal conditions? (Hategeka et al., 2021). For some, the answer will be simple. Strictly, this study provides insufficient evidence that the intervention works well enough, so why use it? For others, providing evidence on what improves hospital care and outcomes is more complicated. For yet others, the answer is well, complex.
Making things complicated are the number and range of issues that make designing, conducting and interpreting multi-faceted interventions targeting clinical care in hospitals problematic. I reflect briefly on just three; how do interventions work, achieving control in comparative studies and choosing the right measures of success. Firstly, we ought to be able to articulate how our intervention should achieve our desired outcomes. Logic models, theories of change and directed acyclic graphs may help and may be supported by other approaches for understanding implementation (Michie et al., 2011;Powell et al., 2015;De Silva et al., 2014). This conceptual thinking should identify key intermediary clinical practice changes needed to produce mortality effects if this is their goal. Ideally prospectively developed models then drive specific data collection on important intermediary effects. For example, in studies testing interventions to improve triage, emergency and immediate admission care impacts on mortality seem predicated on improving the speed and accuracy with which sick children are seen, assessed and managed. So, do such intermediary outcomes change?
Here, we need to distinguish evaluating intermediate effects from fidelity of intervention delivery (Moore et al., 2015). The latter is primarily concerned with whether or how well what is planned is done. Failure to implement may explain why both key intermediary effects and final outcomes are not achieved. Conversely, if implementation goes as planned but key intermediary effects are not seen how do we interpret any change in our main outcome measure? Examining intermediary effects early may even call into question the value of proceeding to a multi-year mortality endpoint and limit research waste. Unfortunately, an overarching challenge with service delivery interventions and models of them are they can become hugely complicated. A recent typology included over 70 distinct implementation components that might be deployed (Powell et al., 2015). In one Kenyan hospital improvement project, over 20 were deployed simultaneously and evaluating their successful delivery and multiple intermediary effects can become an overwhelming task (English et al., 2017).
Such complicated interventions also pose challenges for 'controlled' studies. Hategaka et al.'s study design is said to control for 'differences in characteristics between intervention and control hospitals that remained constant or changed over time' (Hategeka et al., 2021). When hospitals are the units of intervention the numbers involved in trials are typically few. Conversely, the number of factors that may act as confounders or introduce bias is extremely large, and many may be important but unmeasurable (e.g. clinical leadership) (English et al., 2011). Is it then reasonable to expect that two groups of hospitals are 'balanced' with respect to a huge array of largely unknown but influential factors? Furthermore, such factors may vary across place over time. For example, an influential clinical leader is transferred from one hospital to another. How is such a change controlled for other than by assuming that changes 'balance out' over time between intervention and control groups. Is this likely when relatively small numbers of organisations are studied? If we cannot verify assumptions of balance how safe are we ever making assumptions of internal validity? (English et al., 2011).
Impacts on mortality to justify intervention in highmortality settings is often what we desperately want and what funders demand. It typically means improving many intermediary aspects of quality of care as discussed above. However, is mortality a good aggregate measure of quality care? Many would say no because mortality is highly dependent on caseseverity and case-mix that vary across place and time (Lilford and Pronovost, 2010). For, example, we have seen three-fold variation in mortality across hospitals in Kenya . Adjusting for these factors requires detailed individual patient data which is rarely available. Perhaps more critically mortality is strongly influenced by the cumulative quality and safety of care (or its absence) prior to and over days or even weeks of admission. So how reasonable is it to expect that improving important but time-bound aspects of care will impact mortality when whole systems are weak? Some trials of service delivery interventions have demonstrated effects on hospital mortality in LRS, but these have employed randomization at the individual level and well-resourced, sustained and trial-supported change efforts (Biai et al., 2007;WHO Immediate KMC Study Group, 2021). When interpreting these, the contribution of an often supernumerary 'trial team' is frequently ignored as an input. So, if we cannot demonstrate improved mortality are interventions that (only) improve care processes useful? Very small improvements in hospital mortality that are almost impossible to 'prove' are associated with intervention may still be highly cost-effective (Barasa et al., 2012). So should interventions that do not impact mortality but that improve adoption of multiple evidence-based therapies or management steps be taken to scale?
We outline above how the challenges of statistically testing A vs B interventions while characterizing and accounting for every complicated aspect of context and intervention delivery may be insuperable. Some now regard pursuit of this elaborate but still reductive approach to miss the point because in high and lower-resource settings we are dealing with Complex (Adaptive) Systems. These defy explanation based on linear cause and effect models and it is beyond the scope of this article to explain these ideas in full (but to introduce the topic see (Greenhalgh and Papoutsi, 2018)). Their importance is what they mean for evaluators of service delivery interventions. Such evaluations are very rarely simple A vs B comparisons. Nor are they even tests of A + B + C intervention packages complicated by factors X, Y and Z (and more), representing measurable variations in context or fidelity. Instead, to develop the field we need strategies for evaluating service delivery interventions that pay attention to complexity and clinical researchers need to partner with and learn from and employ methods often developed by social scientists, economists, engineers and others. Also, critical is ensuring those with knowledge of the context and systems are central to evaluation as 'insider knowledge' is key to building understanding. At the same time, we should also examine outcomes valued by policy makers, practitioners and communities who ultimately have the task of sustaining implementation (Gilson et al., 2021). Embedded, multi-disciplinary research linked to collaborative learning platforms with good historical data on context and key outcomes may be especially helpful in this regard (English et al., 2021).

Data availability statement
Data are not presented or used in this commentary. Ethical approval. Ethical approval is not required for this commentary.
Conflict of interest statement. M.E. created the initial ETAT+ course and some implementation strategies and received funding to test their effects in earlier research and he is currently involved in research on the implementation of improved care in Kenyan hospitals.