The European Healthy Cities project can be characterized as a social movement that employs an extremely wide range of political, social and behavioural interventions for the development and sustenance of urban population health. At all of these levels, the movement is inspired by ideological, theoretical and evidence-based perspectives. The result of this stance is a dynamic, complex and diverse landscape of initiatives, plans, programmes and actions. In quantitative terms (the number of WHO designated cities and associated cities and communities through national networks), ‘Healthy Cities’ can be regarded as an extraordinary accomplishment and a credit for both WHO and cities in the movement. In qualitative terms, however, critics of the movement have maintained that little evidence on its success and effectiveness has been generated. This critique finds its foundations in the mere perceptions of evidence, the politics of science and urban governance, and perspectives on the preferred or professed utilities of evidence-based health notions. The article reviews the nature of evidence and its interface with politics and governance. Applying a conceptual framework combining insights from knowledge utilization theory, theoretical perspectives on (health) policy development, theory-based evaluations and planned intervention approaches, it demonstrates that, although the evidence is overwhelming, there are barriers to the implementation of such evidence that should be further addressed by ‘Healthy Cities’.
In this article, we will review theoretical and methodological considerations that should guide the substantiation of evidence of effectiveness of Healthy Cities. In order to do this, we will first present our understanding of the conceptualization of evidence. It will be our assertion that true evidence must be related to practice considerations, and not just draw on esoteric (often academic) premises.
We will then translate that position into a new appraisal of the actions of Healthy Cities, their networks and governance systems. Healthy Cities operations draw on a social model of health and the ambition to address more distal determinants of health, including institutional change for health. Interventions at those levels are far more complex than those aimed at more proximal—or behavioural—determinants. This generates particular challenges to the generation of evidence.
The Healthy Cities movement has been a success since its inception in the 1980s. Drawing on innovations in health promotion, urban planning, ecosystemic perspectives and the move towards decentralization of government services, community-based work and intersectoral action, many thousands of cities around the world have felt that the Healthy Cities conceptualization would provide added value to urban performance, including in the health and sustainable development arena. This in itself is an indicator for the accomplishment of the initial motivation to develop a programme that would demonstrate the feasibility of locality-based health development.
However, in the 20 years of the WHO European Healthy Cities Network (WHO-EHCN), the development of the science and methodology of public health and health promotion seem to have caught up with the innovative stance of the movement, and in some cases the sophistication of their methods and theories seem to have overtaken the urban praxis. Specifically, there is an overwhelming urge to produce evidence for effectiveness for the full range of health promotion and public health interventions, including Healthy Cities. This urge is partly driven by political and cost considerations, partly by the need to deliver good practice.
What should be understood by evidence? McQueen and Anderson (2001) quote Butcher:
There are some unresolved issues in using such a perspective. Particularly researchers equating science with the use of experimental methodological designs would criticize this position as an invitation to use almost any data or opinion as evidence. We will explore precisely this methodological tension.
A piece of evidence is a fact or datum that is used, or could be used, in making a decision or judgement or in solving a problem. The evidence, when used with the canons of good reasoning and principles of valuation, answers the question why, when asked of a judgement, decision, or action. (p.64)
In a position paper by the European Advisory Committee on Health Research (Banta, 2004), the relations between public health, decision making, research, knowledge generation and evidence are presented. The Committee acknowledges the many facets of evidence for public health and singles out healthy cities as a prime challenge in the amalgamation of evidence:
Eriksson has further mapped these problems. He proposes a distinction between four generations of ‘prevention projects’ (I. clinical; II. bioepidemiological; III. socioepidemiological; and IV. environment & policy-oriented), based on different theoretical propositions, which each need increasingly complex evaluation approaches as well as outcome parameters (Eriksson, 2000). Generally speaking, Eriksson, with his differentiations, cites an important development within public health research, stretched out over decades, resulting in an increased recognition of that much can be gained, especially in terms of reaching many people by changing program delivery or policy, by supplementing the efforts to identify individual determinants of health and health behaviour with a focus on social and environmental factors. Recognitions such as these have subsequently provoked efforts to measure, for instance, the impact of manipulating broader determinants of health and discussions on how to expand intervention goals beyond the individual to various community levels.
(…) a legitimate concern is that research in many areas of ‘the new public health’ aims at actions that are difficult to evaluate, such as those in health promotion. For example, what is a ‘healthy city’ and what are the general and specific outcomes sought? Because of these difficulties, decisions that are mainly determined by good evidence of effectiveness would favor interventions with a medical rather than a social focus, those that target individuals rather than communities and populations, and those that focus on the influence of proximal rather than distal determinants of health. This would clearly be unsatisfactory for population health activities. (Banta, 2004, p. 566)
Birckmayer and Weiss have demonstrated that application of theory-based evaluation (TBE) yields better research information on various elements of success and failure of health promotion programs (Birckmayer and Weiss, 2000). TBE expects researchers and program directors to spell out assumptions to a micro-theoretical level, so that outcomes are not just made evident, but can also be explained. This perspective offers opportunities to integrate intra-generational ‘prevention projects’ such as Healthy Cities, drawing heavily on the approaches that Eriksson calls socioepidemiological and environment & policy-oriented, and thus unravel and analyse their various components.
These perspectives give, however, indications of how evidence is to be produced, but not to which purpose. The notion of utility-driven evidence (UDE) (de Leeuw and Skovgaard, 2005) is based on the observations that: Eriksson has endeavoured to identify relevant evaluation strategies for each (increasingly complex) intervention type. In his perspective, the amalgamation of evaluation strategies and their outcomes would lead to compelling evidence for decision-making (Eriksson, 2000). Tones has argued that evidence is multi-dimensional, and that measures of success are an assembly of different types of evidence, such as witness accounts, expert testimony, lab tests, etc. (Tones, 1997). In short, Eriksson has established an academic, and Tones a social evidence paradigm. Neither, however, speaks out on the question to which purpose either type of evidence is generated. We argue for an overarching utilitarian evidence paradigm: whether taking a social or purely scientific perspective, the producers of evidence should take into account how their products may be used in shaping good practice, and guiding policy choices.
the generation of evidence serves a purpose beyond mere intellectual curiosity (McQueen and Anderson, 2001);
(health) policy-making takes place in complex interaction between stakeholders (McDougall and de Leeuw, 2006);
the application of evidence in decision-making argumentation may transform ‘facts’ into ‘beliefs’, and ‘beliefs’ into ‘facts’ (de Leeuw, 1989). Cummins and Macintyre (Cummins and Macintyre, 2002) have described this phenomenon as leading to ‘factoids’.
However, policy decision-making is a messy affair, by some described as ‘muddling through’ (Lindblom and Cohen, 1979), or a negotiated space (Stone, 1997). Kingdon has demonstrated that windows of opportunity for policy decision-making are created when policy entrepreneurs have applied a process of ‘alternative specification’ in which different representations of the same ‘truth’ are presented to stakeholders in the process (Kingdon, 1995). This implies that the evidence used in alternative specification may take different shapes for different stakeholders.
The perspective is shared by Weiss (Weiss, 1979, 1998) and Vedung (Vedung, 2000). In their work on research utilization (or, in our terms, the application of evidence for decision-making purposes), they maintain that research is put into action through different strategies. Six models are proposed. The compelling conclusion is that social and academic evidence, even when produced properly, may not have a significant impact on policy. We thus take the position that evidence, if it is to be used by policy entrepreneurs, should be utility-driven: its generation should—within parameters of scientific rigour—take into account how it can be used to influence the judgement of policy stakeholders on how success is to be defined.
The knowledge-driven model: new knowledge will lead to new applications, and thus new policies. An example could be fundamental research into nuclear resonance signals, leading to the development of Nuclear Magnetic Resonance and Magnetic Resonance Imaging scanners, the emergence of which led to medical technology assessments to assist governments in deciding where and how the costly new technology could be implemented.
The problem-solving model: research findings are actively sought, and used for pending decisions. In its ideal form, health impact assessments (HIAs) are an instrument in this model; HIAs supposedly are commissioned to guide decision-making related to proposed profound environmental and social change operations.
In the interactive model, incremental policy change is interactively driven back and forth by emerging research outcomes. The current Swedish national health policy is an exemplary application of this model, taken some 20 years to establish.
The political model leads to research being used to support partisan political positions. Debates around the acceptability of nuclear power demonstrate the different political connections to different research perspectives.
In the tactical model, the fact that research is being undertaken may be an excuse for delaying decisions, or deflect criticism.
And in the enlightenment model, concepts and theoretical perspectives that social science research has engendered permeate the policy-making process, rather than single studies or research programmes having a discernable impact on policy priorities.
In order to review this position, in this paper we will look at Healthy Cities methodologies, the application of theories and the emerging meta-theory that may guide future evidence generation in complex health promotion interventions.
METHODOLOGY IN HEALTHY CITIES
‘Methodology’ is the logic of method. Methods (from Greek Mεθοδος—‘guide’) are procedures that guide a certain activity. This activity may be the implementation of tools which might, for instance, lead to community empowerment. On the other hand, in the research domain, methods guide the collection of data.
In either case, the choice of method depends on a certain frame of reference. Sometimes this frame of reference is implicit, and that would make a justification of why a certain method is employed difficult. For instance, we all know situations where the implicit assumption around lifestyle problems is that they are caused by a lack of knowledge, and that thus the method of choice should be to increase knowledge through information (health education). When this chain of assumptions is carefully scrutinized, other methods might emerge as being more adequate in dealing with the problem (e.g. the implementation of laws and regulations). Particularly in the field of research, one needs to be explicit on this frame of reference, so as to make research valid and reliable. This being explicit about choices made leading to a certain method (or set of methods) is ‘methodology’.
Methodology is therefore closely related to an existing frame of reference. In most scientific fields, this frame of reference is a theory. A theory is a set of conceptualizations of reality that allows for predictive statements. For instance, a theory could be ‘All swans are white’. The predictive statement would be ‘Next time I see a swan, it will certainly be white’. The methodology would then dictate that we would have to observe all big birds, identify them as being swans or not, and determining their colour. The resulting methods (i.e. guides for the collection of relevant data) would dictate: (a) observation of big birds; (b) apply a selection tool that would identify them either as swans or some other bird (which then would be dropped from our inquiry); (c) apply a selection tool that distinguishes between ‘white’ and ‘non-white’. As soon as we have a non-white swan, we would have to refine our theory.
Most theories, unfortunately, are not as clear-cut as the ‘Swan-Colour Theory’ presented here. They may consist of elaborate sub-sets of conceptualizations which often describe reality in rather abstract terms. For instance, social psychological theories in health behaviour would speak of ‘attitude’, ‘belief’, and other notions which do not immediately seem to be connected to a reality a health entrepreneur is facing in her daily work. But this situation (which might be described by some as ‘vagueness’) makes it all the more important to be explicit.
HEALTHY CITIES METHODOLOGY: THE PRAXIS ANGLE
Before we can start exploring the evidence base relevant for Healthy Cities, we need to say something about Healthy Cities methodology. As stated earlier, scientific methodology has a relation with a frame of reference (or ‘theory’). Applied to the Healthy Cities movement, this means we will have to review some of its conceptual foundations, and more specifically, the assumptions that guide Healthy Cities work.
Obviously, there is no one ‘Healthy Cities Theory’. Over the years, Healthy Cities have amalgamated a number of approaches to the promotion of health in urban settings. For instance, one guiding principle of the Healthy Cities movement (which can be traced back to visionary statements such as the Declaration of Alma Ata, the WHO European Health for All strategy, and advances in health promotion research) has been community action. There is a wealth of theoretical approaches to community action (leave alone an enormous body of work on ‘empowerment’, more recently influenced by the emergence of studies into social capital and health). Generally, in the area of community action, three distinct models have been identified, with different roles for communities and professionals. Each of these models has advantages and disadvantages, and their application depends on existing contextual phenomena, such as political configuration (conservative governments tend to choose another model than social-democratic governments), demography (areas with a population with overall higher social-economic status tend to approach community action different from areas with more vulnerable populations) and tradition (Nordic welfare states traditionally involve communities in decision-making in radically different ways from most countries from central and eastern Europe).
This situation does not only apply to ‘community action’. It is equally valid for other foundations of the Healthy Cities movement, such as striving for equity in health, sustainable development and approaches to organizational development (intersectoral action, networking) and policy development. In each of these fields, there is a variety of (theoretical) conceptualizations identifying different models relevant for different unique contexts.
However, there is what we might call a ‘meta-theory’ that drives Healthy Cities. Such a meta-theory describes a structure in which other theoretical elements find a place. It is ‘a theory about theory’. Over the years and through the different phases of the WHO European Healthy Cities Network (WHO-EHCN), this meta-theory appears to have been refined, from a broad ‘demonstration project’ approach in the first phase (1987–1992) to a system in which cities had to demonstrate eligibility to enter into the network, and subsequent meeting of a set of designation criteria (third phase, 1998–2002).
It is our position that the meta-theory for Healthy Cities has significance beyond specific project parameters established for the different phases of the European WHO Project. For one, such a meta-theory should also be relevant for Healthy Cities outside the ‘official’ scope of the WHO-EHCN (and we know that there are several thousand of such cities globally).
In Figure 1, the meta-theoretical perspective is visualized. In the figure, three main components of the meta-theory are identified: proximal and distal determinants of health, proximal and distal interventions for health, and ‘known impact’.
In the determinants field, the determinants circle itself, and the overarching notion of ‘institutions’ are the important elements. This is not the place to review extensively the existing evidence on determinants of health; readers are referred to, for instance, Wilkinson and Marmot (2005) or Berkman and Kawachi (2000). Conceptually, in the model presented here, it is clear that lifestyles, genetics/human biology and environment do not play equal roles in determining health in populations. For each unique population health issue in its unique context, there is a unique mix between determinants of health. To a significant extent, the observed and valued degree of impact of a determinant on the health problem is the result of existing institutions in the given context.
The use of the word ‘institution’ merits some clarification. Basically, institutions are systems of order that create, act on, preserve and legitimize complex forms of common knowledge. Given the task of stabilizing the identity of a society, institutions emerge from what Edmund Burke calls an act of constitution: that is, institutions enact norms necessary for social problem-solving. Hannah Ahrendt further said that an institution is a body of people and thought that endeavours to make good on common expressions of human purpose.
In our meta-theory, ‘institutions’ therefore stand for ideals and perspectives that have sometimes been labelled as normative or value-laden, such as equity in health, sustainable development and communicative qualities in human and organizational relations. Again, the value of a meta-theory is demonstrated here: on an aggregate level, Healthy Cities agree that these are important aspects of their work, but in the specific context for each city these institutions may take different shapes, employ different (political) paradigms and guide different operational actions.
Institutions also impact on parameters of different proximal and distal interventions for health. In the more traditional literature on determinants of health, ‘health care’ (or sometimes ‘health systems’) are presented at par with lifestyles, genetics/human biology and environmental factors (e.g. the Lalonde Report (Lalonde, 1974)]. We would argue that there is a qualitative difference between those four ‘determinants’. Although health care of course impacts on health status, it has much more a deliberate intervention concept as its guiding principle than do the other three.
The provision of health care (or the maintenance of health systems) in many countries is the ‘default setting’ in promoting individual and community health. However, again we would have to ascertain that many municipal authorities do not have competencies in the field of providing medical care (Green, 1998). Within the Healthy Cities conceptualization, more distal interventions for health are considered more important. These include community organization and social development, organizational and infrastructure development and policy development. Again, it is clear that for any specific population health issue, there is a unique combination of these interventions that might work best in any unique (urban) context. As one of the main tenets of the Healthy Cities meta-theory is ‘to put health high on social and political agendas’ (Tsouros, 1994), these three developmental aspects are found to be the pillars upon which such agenda-building should rest.
It can be argued that the requirement in the third phase of the WHO-EHCN to establish a city health development plan (CHDP) endeavoured to ingrain a set of norms, values and organizational behaviours which ‘institutionalizes’ (in the sense used here) a specific approach to urban health.
The third column in the graphic representation of our Healthy Cities meta-theory is ‘known impact’. Here we have to draw upon the increasing body of knowledge in evidence-based health promotion. The Cochrane Collaboration in Public Health and Health Promotion (CHPPHF, 2002), the reviews by the British Health Development Agency (HDA, 2002), the PREFFI instrument developed by the Netherlands’ National Institute for Health Promotion and Disease Prevention (NIGZ, 2002) all provide lists of characteristics of effective interventions. The International Union for Health Promotion and Education (IUHPE, 2000a, b) produced a report with an assessment of 20 years of evidence of the health, social, economic and political impacts of health promotion. Generally, the findings from these reviews demonstrate that distal interventions for health in their appropriate ‘mix’ provide a broad and sustainable effect on population health, whereas proximal interventions for health (health and patient education, health care) yield focused health gains (often disease, gender, and age-group specific) against relatively high cost. However, the further up the scale (the more distal the determinant and intervention mix), the more complex the associated methodology. This creates problems both for an effective argumentation of Healthy Cities actions (‘What does policy development effectively do for health?’) as well as for the establishment of a methodology for Healthy Cities research.
In the chronology of WHO Regional Office for Europe commissioned, sponsored and supported Healthy Cities evaluations, we can see an evolving ambition equal to the logic of this model. Health education and more traditional health promotion endeavours, and their associated assessments, two decades ago typically focused on the lower end of the model. Characteristically, lifestyle and environmental health issues were high on the agenda of local health services and government agencies. Throughout the phases of the WHO-EHCN, there has been an endeavour to shift attention more to the upper end of the model. The charge of the development of city health plans (Phase II) was obviously to integrate ‘lower end’ perspectives and interventions into a more comprehensive package; CHDPs (Phase III) were to address that integrated package through more institution-focused intervention frameworks. Clearly, this has presented complex theoretical and methodological challenges to evaluation endeavours.
HEALTHY CITIES RESEARCH METHODOLOGY
In this section, we will try to do two things. First, we hope to establish a general perspective on research methodology in Healthy Cities, irrespective of their association with local, national or international legal or moral requirements; in some countries, national or regional authorities require local governments to publish health plans which would set unique parameters for associated Healthy Cities research. These requirements are different from place to place, dependent on existing competences and traditions. One set of requirements which has wider importance are the designation criteria for entry into the WHO-EHCN. These have led to a series of tools and methodological approaches that could be regarded as exemplary in Healthy Cities research. This will be our second focus.
In order to establish a general Healthy Cities research methodology, we will have to expand our first figure with components dealing with inquiry systems.
Relatively simple problems are being addressed by relatively simple theories with relatively simple methodologies. The question whether a new pharmaceutical product is effective (the straightforward theory is ‘Pill X cures disease Y’) is dealt with through a methodological approach which has become the ‘gold standard’ in the health sciences: the randomized controlled trial (RCT). RCTs dictate that two matched populations are established (an ‘experimental’ and a ‘control’ group). The supposed effective intervention is administered to one population, and the other population receives an intervention which is known as ineffective. Neither the researchers, nor the populations are unaware of which group receives which type of intervention (this is called a ‘double-blind’ design). Any significant test results can supposedly be attributed to the effectiveness of the intervention, as all ‘confounding factors’ (outside factors that might influence measurements and effectiveness) have been cancelled out by the research design. Appropriate application of the RCT methodology is based on a number of assumptions: the experimental and control groups are homogeneous (often ‘healthy men between 18 and 60’) and test conditions have been randomized completely (any factors that might influence the test procedures are distributed randomly in the populations) so as to allow for statistical analysis.
Complex social issues (such as those dealt with in Healthy Cities settings) might not be assessed appropriately with the RCT approach. As will have become clear from the above, the conceptual frameworks guiding the research endeavour are far more complicated and intricate, leading to questions which cannot be resolved through the ‘experiment-control’ notion. Health scientists with roots in this tradition therefore try to apply the ‘quasi-experimental’ research design. In Figure 2, in the open box ‘inquiry systems’, we are moving towards a more naturalistic evaluation approach. In quasi-experimental designs, investigators recognize that conditions cannot always be randomized, and that ‘real’ populations are not as homogeneous as the RCT approach assumes. An added feature of quasi-experimental designs is therefore that measurements take place at various points in time (a T0 measure before the intervention, and T1−n measures during and after the intervention) and in different natural settings with a high degree of similarity (e.g. neighbourhoods with comparable demographic profiles). In Healthy Cities, apart from very practical considerations (what would, for instance, be the ‘control’ setting if the town of Horsens is the ‘experimental’ setting?), socio-political dynamics will often not allow for such a methodology. There may be elections, and political priorities may change during the inquiry period. The economy may experience an upswing, a new factory is opened and socioeconomic status in the neighbourhood will (slowly) change. The housing authority may all of a sudden decide to redevelop an intervention setting and there is an influx of people with entirely different characteristics from those assumed by the research design.
McQueen and Anderson (McQueen and Anderson, 2001) in their ‘Evaluation in Health Promotion’ chapter—which should be required reading for anyone engaging in Healthy Cities research—eloquently describe the methodological problem:
This quote leads us the issue of ‘error’. In the philosophy of science, there is general recognition of the existence of two types of error: Error Type I (a hypothesis is rejected while in fact it is true—in serum testing referred to as false-negative) and Error Type II (hypothesis accepted while it is false—false-positive). Mitroff and Featheringham introduce the concept of Errors Type III (wrong conceptualization of the problem, yet elegant and significant research outcomes) (Mitroff and Featheringham, 1974). An example of an error of the third kind has for a long time been research in the area of poverty and health. The conceptualization of the problem dictated an inquiry into the effects of poverty on health, and indeed, such effects were shown to be profound. Only recently the reconceptualization of the problem allowed for a more meaningful inquiry highlighting more complex causal pathways between health and poverty, thus opening up a new debate on possible interventions in the realm (WHO Regional Office for Europe, 2002).
Unfortunately, many health promotion researchers put the cart before the horse when choosing research methods. They let research methodology drive the investigation, rather than allowing theory and models to provide the conceptual underpinnings for the advancement of knowledge. With such conceptual understanding, investigators can then seek appropriate methods. For instance, many researchers inappropriately use randomised controlled trials in health research. (p. 73)
Guba and Lincoln argue for an evaluation approach that would prevent Type III Errors to occur (Guba and Lincoln, 1981, 1989). This ‘Fourth Generation’ or ‘naturalistic’ inquiry includes modalities to deal with ‘messy’, ‘wicked’ (Churchman, 1967) or ‘ill-structured’ (Mitroff and Mason, 1980) problems (de Leeuw, 1989). For Fourth Generation Evaluation, the acronym 4GE ('forge’) is appropriately chosen, as 4GE is a participatory, dialectic, post-modern scheme of reference ultimately leading to consensus on evaluation parameters, their use and expected outcomes. The 4GE methodology is not unique, extremely innovative or past any current paradigm. Boutilier et al. describe what they call ‘community reflective action research’ that incorporates stakeholder perspectives in policy development (Boutilier et al., 1997). Fourth Generation Evaluation assumes the following steps in the development process: (1) contracting, (2) organizing, (3) identifying stakeholders, (4) developing within-group joint constructions, (5) enlarging joint stakeholder constructions through new information/increased sophistication, (6) sorting out resolved claims, concerns, and issues, (7) prioritizing unresolved items, (8) collecting information/adding sophistication, (9) preparing agenda for negotiation, (10) carrying out the negotiation, (11) reporting and (12) recycling.
Another ‘new’ philosophical approach to effective evaluation in complex socio-political contexts is ‘realist evaluation’ (Pawson and Tilley, 1997). Similar to 4GE, the perspective acknowledges the diverse political and community drivers for the generation of specific types of evidence of effectiveness.
One might too easily assume that 4GE or Realist Evaluation leads to ‘vague’, ‘uncontrollable’ or ‘soft’ (i.e. qualitative) research. This is not correct. The approach simply allows for selecting the right conceptual framework (theory) for a jointly defined problem, and thus leads to the most appropriate methodology—which could in fact be the RCT.
TOWARDS UTILITY-DRIVEN EVIDENCE FOR HEALTHY CITIES
One issue that becomes obvious from an—conscious or unconscious—application of the 4GE approach is that the logic of method is contextual. Although few ‘traditional’ academics would have the courage to acknowledge, this is a fact that has been around even before 4GE: a problem could well be investigated through a certain theory and set of methods, but limiting factors are always human and financial resources.
Suppose we would want to investigate the impact of Healthy Cities procedures on the reduction of health inequities. The conceptual framework would indicate that the deliberate establishment of CHDPs would contribute to a reduction of inequities, whereas cities that would not do so realize no, or less, reduction of health inequity. Following an appropriate logic of method, this would lead to a very elaborate set of methods. These include a review of historical factors leading to inequities in health in a number of selected (possibly matched) urban settings, social epidemiological data gathering or compilation of relevant data from existing city sources (carefully scrutinizing an appropriate use of existing indicators and/or the application of standardized indicators), a review of urban policies or procedures explicitly or implicitly addressing the equity issue (the conceptual literature would indicate that explicit policies for the reduction of inequities in health might be as effective as general urban socio-economic policies which would thus be addressing the problem implicitly), a selection of cities or their neighbourhoods where an impact may be expected from these implicit or explicit policies versus settings where this might not be expected, process evaluations of the extent to which these policies are factually implemented (including the factors which impede or facilitate development and implementation of these policies), assessments to review the possible influence of participation of cities in Healthy Cities programmes (or similar grander schemes such as, for instance, in Britain the Health Action Zones or in the Netherlands the Ministry of the Interior's Urban Policy), evaluations of subjective and objective benefits of the programmes, and ultimately, the attempt to attribute any changes in health inequities to any of the procedures developed.
Clearly, such a methodology requires an enormous logistical and resource effort. Even if an agency were found to fund such an effort, the organization of the evaluation project and to put in place mechanisms for the continuous monitoring of research quality (including measures of reliability and validity) would be almost beyond comprehension. This is why much simpler procedures are applied, often leading to research products which can easily be criticized, or which are not satisfactory to the research constituents (communities, politicians etc.). These procedures follow the ‘logic’ of pragmatism and opportunism much more than the full logic emerging from a conceptually based approach.
Nevertheless, the call for monitoring and evaluation of ‘wicked’ problems such as those addressed by the Healthy Cities project is urgent and necessary. From the very start of the WHO-EHCN, monitoring and evaluation have had a solid position on the agendas of both WHO and partner cities. There has been a natural evolution of methodological perspectives throughout the different phases of WHO-EHCN. These perspectives could be summarized as follows.
Phase I: initiation of the project as a ‘demonstration vehicle’ for the feasibility of urban health development. The resulting methodology emphasized the importance of unique experiences in each of the participating cities based on a joint set of values (including the establishment of health profiles). This led first to the compilation of series of case studies (also referred to as ‘models of good practice’) which were reviewed and analysed by means of site visits and a project-wide questionnaire (Draper et al., 1993, also Price and Tsouros, 1996). Independent from WHO led evaluations, there were increasing numbers of studies into the ‘value’ of Healthy Cities in Europe, such as, for instance Milewa and de Leeuw (1995); Goumans (1998); Goumans and Springett (1997), and beyond [e.g. Werna and Harpham (1995, 1996); Werna et al., (1998)].
Phase II: continuation of the WHO-EHCN through the more rigorous application of principles and preferred procedures (such as the city health plan, intersectoral steering groups and networking approaches). This led to a joint effort between WHO, partner cities and European Commission to assess such principles and procedures in 10 selected cities from the European Union, and further multiple-case-studies (Yin, 1994) from the entire network in which particularly urban policy-making and networking were emphasized [e.g. (de Leeuw, 1999; Capello, 1999, 2000; Kenzer, 1999)].
Phase III: further formalization of entry requirements for cities (eligibility and designation requirement assessments which served as a baseline measure). The start of this phase was characterized by extensive collaborative work between cities, researchers and WHO to establish a monitoring and evaluation format.
MARI: MONITORING, ASSESSMENT, REPORTING AND IMPACT
Towards the end of the second phase of the WHO-EHCN, as soon as it became clear that a third phase would be politically and practically feasible and in fact requested by many European local governments, the issue of monitoring and evaluation was taken up by WHO and its Healthy Cities governance structures. Cities that wished to be part of the third phase had to commit to a strict set of eligibility and designation requirements. These fell into four categories: For each of these, there was a set of requirements (WHO Regional Office for Europe, 1997). Cities would have to demonstrate that they met these baseline requirements. A designation application package was to be sent to the WHO Regional Office for Europe, upon which two assessors scored the eligibility for entry on a total of 21 parameters. The assessors would then make a recommendation to WHO whether or not to accept a city.
Endorsement of principles and strategies.
Establishment of project infrastructure.
Commitment to specific goals, products, changes and outcomes.
Investment in formal and informal networking and cooperation.
In close consultation with member cities, it was recognized that this set of eligibility parameters would establish a proper baseline appraisal for further monitoring of progress of cities towards the establishment of preferred outcomes by the end of the Phase. The most critical of those outcomes would be the adoption and implementation of CHDPs reflecting the core values of the project (designation criterion C1: Cities must produce and implement a city health development plan (CHDP) during the third phase, which builds on previous integrative city health planning and reflects the values, principles and objectives of health for all for the twenty-first century and Local Agenda 21; relevant national health strategies; and local city-specific priorities. This plan must have clear long term and short term aims and objectives and a system on how the city will monitor whether these objectives have been met (indicators and evaluation framework).). At the same time, designated cities had also committed to a rigorous approach to such monitoring and evaluation (designation criterion C2: Cities should implement a programme of systematic health monitoring and evaluation, integrated with the city health development plan, to assess the health, environmental and social impact of policies within the city. In addition, cities should strengthen health accountability mechanisms and measures.).
Following the adage ‘Only evaluate what you set out to do’, the designation criteria thus became the parameters against which Healthy Cities developments were to be measured. Further consultations with Healthy Cities representatives at a series of business meetings, and advice from the WHO Healthy Cities Evaluation Advisory Committee, led to an operationalization of the designation criteria into three fields of inquiry, comprising several hundred specific questions: This overwhelming multitude of questions was subsequently condensed (after a trial run among cities showing that in its most excessive form an ‘average’ city response would amount to an unmanageable file of hundreds of pages) to an annual reporting template, ART (Table 1).
presence of policies, adherence to principles, and involvement of actors;
processes of change;
results, impact, outcomes and outputs.
MARI and ART were attempts at designing a TBE exercise, in which the theory was constituted by the normative, causal and final relationships assumed to govern Healthy Cities operations and development (cf. Milewa and De Leeuw, 1995). The conjecture was also that the application of ART on an annual basis would allow for at least 4 years worth of information, leading to a time-series analysis of Healthy Cities dynamics. Also, designated cities were encouraged at least once during Phase III to address the more comprehensive MARI questions.
In spite of the extensive and rigorous processes to engage Healthy Cities in the development and application of these research tools, both quality and quantity of responses all through the phase has been an issue of concern. In spite of a strict commitment to monitoring and evaluation, at most 50% of cities ever responded reliably in a single year, with almost 50% of this group doing so before established deadlines for submission. Apart from this seriously compromising rigorous analysis, there were other factors that continued to violate the scientific integrity (validity and reliability) of the method.
In order to enable higher response rates, questions were reformulated during the run of MARI and ART; this impacted on the comparability of responses. A large number of cities were designated while the phase was under way, some of them even towards the end of the period. Again, this compromised the intended time-series analysis as new respondents came in. In spite of the extensive processes of pre-testing among respondents which seemed to have validated key concepts, responses demonstrated differences in conceptualizations among cities of these concepts. For instance, the notion of ‘empowerment’ turned out to mean ‘engaging with communities’ for some respondents, whereas for others it meant ‘enabling decision-making procedures at neighbourhood level’.
Returning to our earlier postulate that evidence will only be used in policy and practice if it is perceived to be useful (the UDE concept) the relative failure of the MARI and ART exercises also demonstrated that the generation of evidence should be utilitarian; clearly, if urban administrations and their Healthy Cities Coordinators would have believed that MARI and ART would establish helpful parameters for the more effective operations of their projects, they would have delivered more substantive data.
In fact, in some cases this has happened. Several national networks of Healthy Cities have applied MARI and ART to assess the operations of their networks successfully [e.g. in Denmark (National Institute of Public Health, 2000) or in Poland (Iwanicka, 2003)], showing the volatile nature of ‘utilitarian evidence’: what is considered useful in one context may not be in another.
The coordinator of the WHO-EHCN, the WHO Regional Office for Europe, in concluding Phase III, recognized the multi-dimensional, complex and diverse agendas of stakeholders in its endeavour. Rather than seeking to establish a grand ‘one size fits all’ evaluation scheme, WHO sought to ‘harvest experience and ‘know how’ on core aspects of the project’, cognizant of the range of such experience and know how in the 56 participants in the third phase.
The logic of method for generation of this ‘evidence’ has been subject to intense debate in WHO, among researchers with a long-standing track record in Healthy Cities evaluations throughout the three phases, and with participant cities. Although it was recognized that the qualitative nature of the problem (harvest experience) would merit face-to-face interviews, content analysis of documentation, focus group sessions and process evaluations in each of the designated cities, the scale and resource implications of such an exercise were—as signalled above—beyond the capacity of the combined resources of all involved. Instead, a more realistic scenario was pursued, in which a questionnaire was developed that focused, by means of open-ended questions. Background information on embedding the individual Healthy Cities projects in broader urban initiatives was also generated.
The main purpose of this review of Phase III was to draw out lessons, case studies, stories and innovative ideas from the efforts and experiences of the member cities of the WHO-EHCN. The review was structured around six main questions agreed at a business meeting of the WHO European Network in October 2001: The areas addressed by these questions are interrelated and were developed in a questionnaire that was sent out to all cities in the WHO-EHCN towards the end of Phase III in July 2002. The response rate was 44/56 = 78%. The cities debated the initial findings at a business meeting of the WHO-EHCN in September 2002, and cities were interviewed to further clarify their responses. This review also included a review of CHDPs, city equity policies and city health profiles and an analysis of baseline healthy city indicators. The review of the WHO European Network in 2002 was complemented by a review of the functions, achievements, specific features and challenges faced by the European national Healthy Cities networks.
Have cities been successful in forging effective partnerships with other city departments and sectors?
What have cities done to address equity?
Have cities been successful in developing and implementing a city health development plan? What were the scope, breadth and quality of their plan?
To what degree have healthy cities developed working links and synergy with other projects and initiatives such as Agenda 21 and central government programmes?
How did the Healthy Cities project influence local structures and processes relating to health?
How do cities perceive the impact the project has had in their city? Do they have evidence for this?
This volume reports on evaluation efforts in the third phase of the WHO European Healthy Cities Network. They appropriately reflect an enormous range of research issues, implicit and explicit theoretical frameworks, and methods and methodologies. As outlined earlier, they should be regarded as a package, in context, and as an element in an evolutionary endeavour to compile relevant evidence of effectiveness and experiences of Healthy Cities.
This article is based on an evaluation commissioned by the WHO Regional Office for Europe.