Towards more evidence-based agricultural and food policies

The goal of this paper is to provide insights into how scientific evidence can be used for policymaking and put evidence-based agriculture and food policies at the top of research and policy agendas. We illustrate how scientific evidence can be used in a targeted manner for better policymaking and present an overview of the rich set of ex-ante and ex-post evaluation methods and tools that agricultural economists use for evaluating agricultural policies to provide evidence for policy decisions. We present insights into both established and new/emerging methods and approaches, including their advantages and disadvantages, and discuss their potential use for policy evaluation. We also discuss how methods and approaches should be combined and could be better targeted towards decision-makers. The paper also discusses the crucial role of high-quality data in supporting the science—policy interface. Finally, we present an overview of papers in this special issue titled ‘Evidence-Based Agricultural and Food Policy: The Role of Research for Policy Making’.


Introduction
European agricultural policy has evolved greatly since the early 1990s concerning both the expansion of objectives and the measures to achieve them.While productivity, farm income, and affordable food prices were objectives from the very beginning, environmental and social objectives were increasingly introduced in the Common Agricultural Policy ( CAP ) ( Pe'er et al. 2019 ) .
Current agricultural policies aim to ensure the provision of safe, nutritional, and affordable food, to reduce the negative impact of production on the environment, to increase animal welfare, and to promote viable rural livelihoods.These objectives apply to the European Union's CAP and to the agricultural policies of other European countries, such as Switzerland ( El Benni and Lehmann 2010 ;Pe'er et al. 2019 ) .To reach these objectives, policy measures have been more targeted towards specific goals and have increasingly been tailored to specific farms ( Matthews 2013 ;Finger and El Benni 2021 ) .However, despite high governmental spending on European agricultural policies, most agri-environmental, social, and animal welfare objectives in particular, but also The role of research in policymaking.Phases of the policy cycle ( it is also included in the uploaded document ) : policy design and preparation, adoption, implementation ( transposition, complementary non-regulatory actions ) , application ( including monitoring and enforcement ) , evaluation and revision ( EC, 2021a ) .
requires a comprehensive portfolio of methods to analyse the impacts of agricultural policies and provide evidence for policymaking.To strengthen policy evaluation throughout the policy cycle, the EU's Better Regulation Agenda was introduced in 2015 ( Listorti et al. 2020 ) .According to the agenda, "Better regulation" refers to the Commission's regulatory policy, whereby it seeks to design and prepare EU policies and laws in such a way that they achieve their objectives in the most efficient way' ( EC 2021a ) .Because evaluation 1 is among the key elements of regulatory policymaking ( OECD 2018 ) , the Better Regulation Guidelines are accompanied by a toolbox for ex-ante and ex-post evaluations ( EC 2021b ) .
As shown in Fig. 1 , researchers can provide evidence to policymakers in both ex-ante and ex-post evaluations.For example, this evidence can comprise insights on policy outcomes, impacts, and underlying behavioural changes ( mechanisms ) , as well as provide insights for cost-benefit and cost-effectiveness analysis. 2Methods for ex-ante evaluations include simulation and optimization models and behavioural experiments.Quasi-experimental designs, replication studies, systematic reviews, and meta-analyses are methods used in ex-post evaluations.Establishing a causal link between a specific policy and observed changes ( i.e. attribution ) requires quantitative analysis.However, when financial resources for an impact evaluation or the number of observations are limited, qualitative contribution analysis can be used to inform decisions ( White and Raitzer 2017 ) .Ultimately, combining insights into policy processes from qualitative analyses with the impact estimates from quantitative evaluations provides a comprehensive assessment for policymaking.The research uses and collects data for its evaluations.Commonly used data sets include farm accountancy data and agricultural census data, survey data, and synthetic and modelled data.In addition, digitalization is increasingly making new data sources available, such as high-resolution remote sensing data, which will become more important for research in the future.Based on these data, farm-and sector-level indicators are developed to support the setup of monitoring systems and the design of new types of policy measures, such as payment schemes for environmental services ( e.g.Latruffe et al. 2016 ;Poppe et al. 2016 ;Poppe and Vrolijk 2017 ;Elmiger et al. 2023 ;Gilgen et al. 2023 a ) .
However, a broad range of scientific methods alone is not sufficient to inform policymaking.Notably, the impact of science depends on the demands of policymakers and practitioners, the evidence supplied by researchers, and the alignment between the two ( McNie 2007 ) .To ensure that scientific evidence generates real-world impacts on policymaking and to promote the uptake of measures by farmers, an actor-centred approach should be taken when constructing ex-ante and ex-post evaluations.Hofmann et al. ( 2022 ) distinguish three types of actors who influence the impact of scientific evidence on policymaking.The first is truth-seeking actors , who base their decisions on available scientific evidence to identify and select a pathway towards sustainable transformation.Science can support these actors by providing more and better evidence ( Haas 2004 ;Montpetit and Lachapell 2015 ) .The second type is sense-making actors , who integrate scientific evidence into their belief systems ( Dewulf et al. 2020 ) .Thus, the impact of science on preferences depends on whether the results of an evaluation match the demand, namely the beliefs and individual experiential knowledge of these actors ( Raymond et al. 2010 ) .The third type, utility-maximizing actors , uses scientific results strategically to pursue predefined interests, substantiate their preferences in political conflicts, and change others' perceptions ( Weiss 1979 ;Choi et al. 2005 ) .
For rigorous scientific evidence to actually affect policy designs and decisions, all three actor types must be considered.Evidence-based policy changes can only be expected if the strategic demand of these actors is met, that is, if evidence-based supply and demand matches and if actors are interested in the necessary transformation ( Hofmann et al. 2022 ) .
The actor-centred approach not only applies to research conducted on public policies but also to the evaluation of corporate policies and policy measures by non-governmental organizations and private donors.Researchers should factor this approach into their evaluations by engaging in knowledge co-production ( Norström et al. 2020 ) and accessing multiple evidence bases that draw on different knowledge systems ( Tengö et al. 2014 ) .Although knowledge-production activities in transdisciplinary research span disciplinary boundaries and meaningfully involve non-academic partners in research design, operation, analysis, and publication, the integration of systematic policy evaluations and transdisciplinary research remains largely unexplored ( O'Donovan et al. 2022 ) .
In the next section, we present methods commonly used for ex-ante and ex-post policy evaluations and provide examples from evaluations conducted in various European countries.

Methods for ex-ante and ex-post policy evaluations to support policymaking
The majority of ex-ante and ex-post policy-evaluation methods are based on the establishment of a valid counterfactual to allow for causal interpretations of policy impacts.In this section, we discuss ( i ) ex-ante policy-evaluation approaches based on both simulations and economic experiments and ( ii ) ex-post evaluations based on quasi-experimental approaches and methods to synthesize results, such as meta-analyses, systematic reviews, and replication studies.Finally, we discuss ( iii ) qualitative methods for both ex-ante and ex-post assessments.

Ex-ante evaluations based on optimization and simulation models
Ex-ante policy assessments are used to provide guidance on the expected costs and benefits of different policy options and their redistributive impact.
Quantitative ex-ante assessments of European agricultural, environmental, and trade policy often rely on partial equilibrium models that operate at the national, continental, or global scales, such as the partial-equilibrium sector models CAPRI, for CAP Regionalized Impact ( Britz and Witzke 2008 ;CAPRI model documentation 2022 ) , Aglink-Cosimo that analyses supply and demand of world agriculture ( Burrell and Nii-naate 2013 ) , or the Global Biosphere Management Model ( GLOBIOM ) , which allows users to consider the competition for land use between the agriculture, forestry, and bioenergy sectors as well as accounting for impacts on carbon stocks and greenhouse gas emissions ( Havlík et al. 2018 ;Pastor et al. 2019 ) .The individual models can also be combined.For instance, Latka et al. ( 2021 ) analysed the effectiveness of consumer-side interventions towards a more sustainable agri-food system using three different economic models, including CAPRI, GLOBIOM, and the Modular Applied GeNeral Equilibrium Tool, or MAGNET ( Woltjer et al. 2014 ) .MAGNET is a multi-regional, multi-sectoral, general-equilibrium input-output model that links industries across the agri-food value chain.
These models were initially designed to estimate supply responses to changing market interventions but not to capture behaviour and impacts at the farm level and thus often cannot reflect the emerging phenomena arising from a system of farms and farm-level decisions ( Colen et al. 2016 ) .To better represent farmers' responses to policy changes, sectoral models, such as CAPRI, have increasingly been disaggregated, for example, by refining spatial aggregation and going from global to farm-type scale ( Gocht and Britz 2011 ;Gocht et al. 2017 ) .Heterogeneous farm-specific behaviour, however, especially that related to enrolment in voluntary agri-environmental programmes, is not captured by these aggregated ( farm-type scale ) models.Along these lines, most of the models available for the ex-ante evaluation of the CAP are implemented at a regional level and thus cannot capture the impacts of policy measures that are farm-and site-specific ( Louhichi et al. 2015 ) . 3 But a better understanding of farmers' behaviour in response to agricultural-policy interventions has become much more important since the change from market to decoupled direct-payment support since agricultural-policy instruments are increasingly targeted towards specific policy goals and tailored to specific farms ( Finger and El Benni 2021 ) .The individual farm is thus the most relevant unit of decision-making, and farm-level models are crucial for ex-ante agricultural-policy evaluations ( Reidsma et al. 2018 ) .Several farm-level models are currently used for policy analysis in Europe ( see Reidsma et al. 2018 for an overview ) .Modelling some individual farms may not always be sufficient to inform policymakers, however, because the ability to upscale the potential of findings from case studies to countries or regions is often limited.
Agent-based models ( ABMs ) are an option to overcome this limitation and to model farmlevel decisions while still being able to infer sector-level outcomes.These models allow users to better represent farmers' responses to changing policy measures ( Balmann 1997 ;Berger 2001 ) .Some of the distinctive features of ABMs are the heterogeneity of the agent population ( e.g.diverse farms, supply chain actors, and consumers ) and the ability for spatial differentiation and the consideration of interactions or transactions between agents, as well as between agents and the landscape, such as those related to social networks, markets, or water management ( Schreinemachers and Berger 2006 ;Nolan et al. 2009 ;Kremmydas et al. 2018 ;Huber et al. 2021 ) .The demand for the development and use of ABMs are increasing.However, currently, ABMs are rarely used to evaluate the whole portfolio of agricultural policies, however, due to their specificity, for example, because they may have been developed for a particular purpose or for a particular spatial scale and are not easily adaptable for other applications and contexts ( Louhichi et al. 2010( Louhichi et al. , 2013 ; ;Ciaian et al. 2013 ;Grovermann et al. 2017 ;Kremmydas et al. 2018 ) .One of the few exceptions for a national-level ABM is the Swiss agent-based recursive-dynamic sector model 'SWISSland', which has been the standard tool used for Swiss agricultural-policy impact assessments since 2011 ( Möhring et al. 2016 ) .Along these lines, the micromodel designed for the ex-ante economic and environmental assessment of the medium-term adaptation of individual farmers to policy and market changes, known as IFM-CAP ( Individual Farm Model for CAP Analysis ) , uses an individual farm-level simulation-model approach to overcome the shortcomings of agricultural-modelling tools for ex-ante evaluations of the CAP ( Louhichi et al. 2015 ) .
According to Reidsma et al. ( 2018 ) , one limitation of most farm-level models ( as well as those used in ABMs ) is their reliance on programming approaches ( i.e. the constrained model is used to attempt to find the optimal solution to simulate actual behaviour ) that may have limited capacities to describe actual and potential farmer behaviour.Reidsma et al. ( 2018 ) highlight positive mathematical programming ( where calibrated parameters are used to model actual behaviour ) as a viable strategy to overcome this drawback in policy-impact assessments.But some linear-programming-based ABMs comprise calibration steps, and actual behaviour is simulated ( Troost and Berger 2015 ) .Many ABMs are also based on heuristics and not on optimization ( Becu et al. 2008 ) .One exception is the IFM-CAP, which is based on information from the Farm Accountancy Data Network ( FADN ) and is calibrated for an average of 3 years, using positive mathematical programming ( Louhichi et al. 2015 ) .In one of the few studies of this kind, Mack et al. ( 2019 ) showed how linear-optimization approaches can be combined with positive mathematical programming approaches in agent-based agricultural-sector models to improve their forecasting performance.Huber et al. ( 2018 ) reviewed 20 ABMs for their representation of decision-making processes and found considerable room for improvement by combining existing modelling approaches and promoting model inter-comparisons.Despite challenges, simulation models remain an important tool for ex-ante policy evaluation.

Ex-ante evaluations based on behavioural experiments
Behavioural experiments have a large potential to provide helpful ex-ante insights into the efficacy and efficiency of policies as well as into underlying behavioural mechanisms ( Palm-Forster and Messer 2021 ) .While such experiments can be less costly for ex-ante evaluation than developing tailored, complex modelling solutions, economic experimentation with farmers for policy analysis in Europe is still in its infancy ( Thoyer and Préget 2019 ;Dessart et al. 2021 ) .
In economic experiments, participants are randomly assigned to either the treatment ( treated with a specific policy ) or control ( not treated with the policy ) group, and decisionmaking is compared between both groups afterwards to assess the effect of the policy.Variants of a policy can also be tested by including different treatment arms.Economic experiments differ in terms of the subjects considered in agricultural-policy evaluations ( e.g.students versus farmers ) , the environment in which the experiment takes place ( lab, lab-inthe-field, or natural context ) , and the type of experimental setting, such as discrete choice, games, or real-world policy implementation ( Colen et al. 2016 ) .Below, we discuss three major forms of experiments: randomized control trials ( RCTs ) , laboratory experiments, and field experiments.
RCTs are experiments conducted in real-world settings and are often considered the gold standard for measuring the net impact of a policy.But RCTs also come with limitations, including resource and time intensity, limited applicability for many real-life agriculturalpolicy settings, and various ethical considerations, including the acceptance of RCTs by farmers ( Colen et al. 2016 ;Morawetz and Tribl 2020 ) .The random assignment of participation, however, ensures that the observed characteristics of the treatment and control groups are the same before treatment ( unobserved characteristics are assumed to be the same ) , thus avoiding selection bias ( Colen et al. 2016 ) .
RCTs are thus well suited for ex-ante assessment of the causal impact of specific policy programmes before they are scaled-up to the entire population, and they have been increasingly used to assess the piloting of innovative policy measures in Development Economics ( Banerjee et al. 2016 ;Duflo 2020 ) .RCTs can be used to evaluate specific policy measures ( such as specific agri-environmental programmes ) but are unsuitable for assessing broad policy reforms and are generally not applied to evaluate large-scale policy programmes, such as the entire CAP ( Colen et al. 2016 ;Behaghel et al. 2019 ) .Even though RCTs could be powerful instruments to assess the effectiveness of policies by randomly introducing potentially relevant variations in design ( Behaghel et al. 2019 ) , they are often difficult and costly to implement, especially if the aim is to analyse the reasons for behavioural change in addition to the overall impact of a policy programme. 4Maintaining randomization for evaluating long-term outcomes and the need for more than one treatment arm in many evaluations are examples of such difficulties.As with several other evaluation designs, the issue of statistical power is crucial in RCTs.If the size of the sample is insufficiently large, then the impact evaluation will be underpowered, which is a particular issue with several treatment arms ( White and Raitzer 2017 ) .This situation may lead to a high risk of not being able to find a statistically significant effect, even though the policy is actually effective.Adaptive experimental designs can help to overcome the problem ( Kasy and Sautmann 2021 ;Jobjörnsson et al. 2022 ) .Another key concern is related to spill-overs between the treatment group and the control group.While clustered RCTs are often used to mitigate this risk, they also aggravate the problem of lower statistical power in settings with strong cluster heterogeneity.
Laboratory experiments are used to observe the behaviour of participants in a highly controlled environment isolated from nuisance factors; tasks and choices are usually formulated in an abstract way, thus allowing replication and enhancing internal validity ( Lefebvre et al. 2021 ) .For agricultural-policy evaluation, such experiments are mainly applied among university students ( Le Coent et al. 2014 ;Schilizzi and Latacz-Lohmann 2016 ) , which tends to limit the external validity of their results.
Field experiments with farmers are highly relevant economic experiments for agricultural policy evaluation ( Lefebvre et al. 2021 ) .The design of the experiment can differ substantially, depending on the evaluator's interest and whether hypothetical/stated preferences or experiment with incentives are used.For instance, discrete choice experiments can be used to provide farmers with a range of hypothetical policy scenarios, of which they will choose their preferred options ( Birol and Koundouri 2008 ;Mariel et al. 2021 ) .A framing of the decision context can also be used as a treatment in field experiments in which participants conduct specific tasks under different treatments with monetary incentives, such as policy designs ( Hermann et al. 2017 ;Thomas et al. 2019 ;Dessart et al. 2021 ) .Recruiting large and representative samples of farmers is often a challenge ( Weigel et al. 2020 ) , and also self-selection and evaluation bias remain limitations in field experiments, in particular when incentives are small and when treatments interact with characteristics that also affect selection into the study ( Krawczyk 2011 ;Abeler and Nosenzo 2015 ) .Furthermore, preferences for different policy designs may differ between the hypothetical experimental setup and the real world.
Experimental approaches can suffer from evaluation bias if participants are aware of the experiment and then change their behaviour because their decisions are being recorded.Changes in behaviour can occur for a variety of reasons, either because ( 1 ) participants want to manipulate outcomes, ( 2 ) the treatment group works harder than normal ( the Hawthorne effect ) , ( 3 ) the control group starts competing with the treatment group ( the John Henry effect ) , or ( 4 ) participants' perception of what the evaluator is trying to test lead to behavioural changes ( Colen et al. 2016 ) .Avoiding evaluation bias is more difficult in discrete-choice, lab, and field experiments than it is in RCTs ( Colen et al. 2016 ) .
El Benni et al.

Ex-post evaluations based on quasi-experiments
Ex-post evaluations allow policymakers to measure the net impact of a policy and establish the reasons for success or failure ( Colen et al. 2016 ) .This information then serves as the basis to abandon, adjust, or upscale policy measures.Various econometric methods may be used to identify causal effects by showing whether agricultural and food policies have had an impact and by quantifying the effects of policy interventions, for example, on production, incomes, prices, and environmental effects.
Quasi-experimental designs allow users to draw causal inferences based on observational data.The most common approach is to establish treatment and control groups in an ex-post manner.Baseline and endline data, collated before and after the intervention ( respectively ) , or even more detailed time-series information, might be available for the two groups and can facilitate causal analysis, but such information is not mandatory.
Analysis may be performed by difference-in-difference, which compares the changes before and after the treatment between the treated and untreated units over the same time period ( Imbens and Wooldridge 2009 ;Iacus et al. 2012Iacus et al. , 2019 ; ;Chabé-Ferret 2015 ) .Other approaches include regression-discontinuity designs ( Cattaneo and Escanciano 2017 ;Wuepper and Finger 2022 ) , synthetic controls ( Abadie et al. 2015 ;Adhikari and Alm 2016 ) , or instrumental variables ( Angrist et al. 1996 ) .These approaches can be combined with matching/weighting, which compares the outcomes of treated versus untreated units with the same observed characteristics.These methods are all suitable for ex-post agriculturalpolicy evaluations and have been applied in various studies on European policies ( Chabé-Ferret and Subervie 2013 ; Mack and Kohler 2018 ;Bertoni et al. 2020 ;Wuepper et al. 2020Wuepper et al. , 2021;;Grovermann et al. 2021 ;Wuepper and Huber 2021 ) .Combinations of approaches are also increasingly common.For example, differences-in-discontinuity designs combine regression-discontinuity designs with difference-in-difference approaches ( Wang et al. 2022 ;Wuepper and Finger 2022 ) .Other combinations such as doubly robust difference-in-difference estimators ( Sant'Anna and Zhao 2020 ) and regressiondiscontinuity-like designs are available ( Cattaneo and Titiunik 2022 ) , such as regressionkink designs ( Card et al. 2015( Card et al. , 2017 ) ) , bunching, and density discontinuities ( Kleven 2016 ;Jales and Yu 2017 ;Blomquist et al. 2021 ) , but they have yet to be used in European agricultural and food-policy evaluations.
In many quasi-experimental ex-post evaluation studies, several important aspects cannot be assessed due to a lack of data.Examples include ( 1 ) the implementation costs of a given policy, ( 2 ) the unintended and deadweight effects, and ( 3 ) interactions with other policies and their interference with the behaviour of involved and affected agents, such as farmers' adoption of an agri-environmental programme ( Esposti and Sotte 2013 ) .In addition, expost evaluations do not always provide sufficient insights into the underlying mechanisms of change or the often-complex interactions between policy measures and farmers' behaviour.The use of mixed-method approaches can close this gap by combining evaluation with indepth interviews and other enquiry techniques that allow researchers to address 'how' and 'why' questions more profoundly and to help with data triangulation.

Ex-post evaluations based on replication studies, systematic reviews, and meta-analyses
Replication studies are a powerful but under-utilized approach to providing evidence for policymaking.Many studies in economics and policy evaluation are not replicable ( Ferraro and Shukla 2022 ;Finger et al. 2023 ) , which has ramifications for the significance, direction, and effect size reported in original studies.For example, Camerer et al. ( 2016 ) showed that 40 per cent of economic experiments published in top economic journals failed to be replicable.Only a small fraction of papers are actually ever replicated; one study found that only 0.1 per cent of papers published in the 50 leading journals in economics were replication studies ( Mueller-Langer et al. 2019 ) .This situation is also important for ensuring credible scientific knowledge, which should be used to design effective and efficient agricultural, food and environmental policies ( Ferraro and Shukla 2020 ) .Policies might be initiated or abandoned based on studies where the true impacts are actually different or less reliable than reported ( Ferraro and Shukla 2020 ) .The replication of policy-evaluation studies thus can be used to verify results, reveal underlying uncertainties, or uncover errors, and thus provide a better scientific basis for policymakers.A special issue on 'Replications in Agricultural Economics' in Applied Economic Perspectives and Policy provides the first examples of relevant replication studies ( Finger et al. 2023 ) .Open research principles ( e.g.open data, open code, and open access ) are key to enabling a required shift towards a replication culture and increasing the usability of and trust in agricultural economic research.Systematic reviews are an important tool for synthesizing the knowledge and scientific evidence about a specific research question ( Page et al. 2021 ) .For instance, the International Initiative for Impact Evaluation ( known as '3ie' ) was established in 2008 to systematically synthesize rigorous evaluation evidence, with evidence-gap maps on various policy issues an important product.Some widely used guidelines for systematic reviews include the PRISMA ( Preferred Reporting Items for Systematic Reviews and Meta-Analyses ) guidelines ( Page et al. 2021 ) .PRISMA allows researchers to coherently extract findings from existing research, critically assess underlying studies, and synthesize the knowledge base into an overarching conclusion.PRISMA also allows users to draw from a wide range of underlying studies, including those that use different methods for the same question, such as simulation models and experimental approaches.
Among others, exemplary questions addressed in systemic reviews in agricultural policies include the following: Meta-analyses provide a systematic approach to empirically synthesize the results from multiple studies on the same question.Researchers can thus synthesize diverging effects and can consider any underlying uncertainties in individual studies.The combining of studies also allows researchers to overcome the lack of power of individual studies and identify any evidence gaps.The use of meta-analysis can increase their understanding of policy questions by integrating a large body of research focused on policy issues.Such analyses can provide combined effect sizes and combined significance levels for the joint outcomes of multiple studies, thus providing the most reliable estimates of the effects of policy actions.For example, several researchers have conducted meta-analyses, including Böcker and Finger's ( 2017 ) meta-analysis on pesticide-demand elasticities, Condon et al.'s ( 2015 ) meta-analysis of the impacts of ethanol policy on corn prices, and Santeramo and Lamonaca's ( 2019 ) meta-analysis on the effects of non-tariff measures on agri-food trade.
The transparent presentation of data and methods, as well as the accessibility of the original data used in individual studies ( such as via data repositories ) , are all prerequisites to enable replication studies and meta-analyses while also supporting systematic reviews.This situation highlights the importance of having open research data for better policy designs ( Nosek et al. 2015 ) .

Qualitative approaches for ex-post and ex-ante policy evaluations
The focus of impact-evaluation approaches has mostly been on largen statistical designs and attribution analyses ( White and Raitzer 2017 ) , which implies that a large number of observations are available for testing the statistical significance of the difference in outcomes between treatment and control groups.In some instances, only a small number of units are exposed to a policy, for instance in the case of regional regulations for selected communities or of financial support to specific enterprises.In other instances, resources are limited, the setting is highly complex, or the evaluation focus is on the change process.In these cases, qualitative approaches are useful to analyse the contribution of a policy to the outcomes of interest.Such methods often rely on systematic analyses of the theory of change and the impact pathway ( Bamberger and Mabry 2019 ) .
Qualitative comparative analysis is a methodological approach to contribution analysis that examines patterns in the data to identify necessary and sufficient conditions for relationships between interventions and outcomes without performing any tests of statistical significance ( Ragin 2008 ;Pattyn et al. 2017 ) .Other methods, such as 'most significant change' or 'outcome harvesting', support the reconstruction of the impact pathway to identify and assess policy outputs, outcomes, and impacts by considering the contributions of multiple stakeholders, programmes, and contextual factors ( Alvarez et al. 2010 ;Blundo-Canto et al. 2017 ;Douthwaite et al. 2017 ) .To trace the trajectory of outcomes, qualitative data-collection methods include key-informant interviews, focus groups, ethnographic research, or simulation games with participant observation, among others ( Stern et al. 2012 ;Hennink et al. 2020 ) .
Qualitative research also allows for inductive analysis ( i.e. to go from specific cases to the general ) and for generating hypotheses regarding the underlying mechanisms of policy impacts.Such research thus complements quantitative approaches, which are usually deductive and are used to test hypotheses.For instance, based on a qualitative multi-method research design and practice theory, Kaiser and Burger ( 2022 ) identified five types of croproutinized crop-protection practices in Swiss agriculture, which implied different responses of farmers to current incentive-based agri-environmental policy instruments.
In addition to policy advice from either a quantitative or qualitative evaluation design, evidence-based decision-making can benefit from mixed-methods approaches.Such methods incorporate a diversity of values, allow for better validation of data and participatoryevaluation elements, and extend the comprehensiveness of evaluation findings through results from different methods that can then be used to broaden and deepen the understanding of policy impacts and impact pathways ( Bamberger and Mabry 2019 ) .

Data requirements
A wide range of micro-level data is required to provide scientific evidence for policymaking by ex-ante and ex-post evaluations, including information on economic, environmental, and social aspects, preferably at the farm level ( Poppe and Vrolijk 2017 ) .Farm-level information is crucial for assessing and monitoring the achievement of agricultural-policy goals.The farm as a unit of decision-making contributes to several functions of agriculture, including economic , through the production of goods and services, ecological , through the management of natural resources, and social , by contributing to rural dynamics ( Latruffe et al. 2016 ) .
To monitor and evaluate agricultural policy regarding the economic situation at the farm and sector level, farm-level data from the FADN is available for researchers and policymakers.Although attempts have been made to expand FADN data to include environmental and social performance indicators ( Andersen et al. 2007 ;Latruffe et al. 2016 ;Poppe et al. 2016 ;Vrolijk et al. 2016 ) , there is as yet no systematic collection of farm-level indicators for comprehensive analysis and monitoring of the economic, environmental, and social functions of agriculture.Kelly et al. ( 2018 ) showed that the FADN in principle is a relevant but incomplete platform to assess farm-level sustainability across the EU; it is incomplete because using FADN data implies that financial variables serve as a proxy for environmental effects, such as the expenses of fertilizer and pesticides per hectare of land ( Uehleke et al. 2019 ;Stetter et al. 2022 ) .Such a proxy can only be a rough approximation of the environmental effects of the input-allocation decisions of farmers; for example, expenditures on fertilizer do not necessarily reflect the amount of fertilizer applied, and even less the nutrient losses to the environment.Along these lines, pesticide expenditures do not necessarily reflect the risk that pesticides pose to the environment ( Möhring et al. 2019 ) .Nevertheless, within a set of various indicators, such approximations can add useful information for policymaking ( Kelly et al. 2018 ) .The next step will be the expansion of the FADN towards the Farm Sustainability Data Network, or FSDN ( Vrolijk and Poppe 2021 ) .
Farm-level information can also be retrieved from agricultural census data, but such data mainly comprise restricted sets of indicators such as the type of agricultural production ( e.g. the number of livestock units or hectares of crops grown ) , land-use-allocation decisions ( e.g. the share of hectares devoted to specific crops ) , zone of production ( e.g. the 'least favourite areas' or altitude ) , or participation in voluntary agri-environmental and animal welfare schemes.But census data usually do not comprise specific outcomes in the economic, environmental, animal welfare, or social dimensions.Other challenges are that census data are not always available for all countries for each year, and that the coverage of agricultural censuses can be geographically and/or statistically restricted ( FAO 2021 ) .
In general, there is a need to expand the in-situ monitoring of land use, biodiversity, ecosystem services, and human well-being ( Pe'er et al. 2019 ) .Besides the established and constantly improving monitoring systems used in Europe to assess the impact of agricultural policies, remote sensing and digitalization are improving the availability of high-resolution data.The increasing availability of high-quality data and the field's ever-increasing computing capacities have allowed a revolution in the application of data-intensive evaluation methods such as regression-discontinuity-like designs to large data sets.For example, high-resolution geospatial measurements of environmental, agricultural, or socio-economic variables using remote sensing or modelled data ( including data generated with machine learning ) have great potential for the evaluation of agricultural and food policies ( Wuepper and Finger 2022 ;Jain 2020 ;Burke et al. 2021 ) .

Aligning research for evidence-based policymaking
As summarized above, numerous methods are available for the evaluation of policy measures that are constantly being developed, and policymakers increasingly demand scientific evidence.But further efforts are necessary for evidence-based policymaking, both from science and from policy: While the further development and use of methods must be geared towards the issues relevant to policymaking, the different stakeholders affected by a policy measure must be involved in the design and implementation of new measures.
We propose five directions for the improvement of research towards evidence-based policymaking.First, the rigorous scientific methods chosen for policy evaluation and design need to be balanced with the relevance of the questions that can be answered by these methods.Second, methodological developments for policy evaluation and design should increasingly focus on the combination and triangulation of different methodological approaches.Third, capacities for evidence-based policy and professional practices must be increased by recognizing the functions of different actors ( such as researchers, policymakers, interest groups, and civil society ) and involving them in the policy cycle, according to their responsibilities.Fourth, new data sources should be developed to be able to provide the necessary scientific evidence.Fifth, the transparency how political decisions are made shall be improved.

Balance between scientifically rigorous methods and the practical relevance of questions
The political ( or practical ) relevance of the questions that can be answered and the rigour of the analytical method must be balanced, both in ex-ante and ex-post policy evaluations.More precisely, even though very robust ex-post evaluation methods are available from research for identifying causal relationships, the requirements on data quality and quantity are high and are not always found in practice.As a result, not all policy-relevant questions can be addressed with the most sophisticated methods.Quasi-experimental methods also frequently focus on one specific measure or variant of a policy measure, so ex-post evaluation designs are often characterized by strong internal validity, but extrapolating outcomes to a larger population ( i.e. external validity ) can be an issue.This situation can lead to a lack of appreciation of this type of analysis by political decision-makers, mainly because such analyses only cover a very specific topic, no insights on the mechanisms of change can be provided, and no interactions with other policy measures are captured.But such analyses can have great value for policymaking if their results are systematically summarized ( e.g.within the framework of meta-analyses or replication studies ) and thereby produce universal results for a wider population.In order to generalize results through systematic summaries, a good description of the respective contextual factors in the individual studies is needed.
One challenge in practice-relevant model-based ex-ante policy evaluations is that bundles of policy measures are often modelled or evaluated as a whole, and the effects of a single measure cannot always be separated from the effects of other measures.Even if the causal effects of individual measures thus cannot be analysed, the analysis of a bundle of measures is of great importance for policy design.Researchers thus should increasingly evaluate bundles of measures, in addition to rigorously identifying causal relationships.

Combination and triangulation of different methodological approaches
Both types of evaluation, ex-post and ex-ante, as well as both types of evidence, causal and contextual, must be part of the policy cycle and need to be better connected.To this end, any policy evaluation should be guided by a theory of change and should test the postulated relationships between interventions and outcomes.
Current developments of methods are still too often focused on individual methodological approaches, such as on quasi-experimental methods, especially with new econometric approaches, or, for example, on the designs of behavioural experiments.Ideally, further development should increasingly focus on bringing together different quantitative and qualitative methodological approaches, for instance, by integrating qualitative data on the change trajectory into quasi-experimental studies.This approach is important because the various objectives and measures of agricultural and food policy cannot be analysed with a single method alone.Researchers should strive for a mix of methods and data and not only apply various methodological approaches; instead, they should integrate these approaches into an overall design for policy evaluation by means of triangulation.As with the selection and use of individual methods for answering a policy-relevant research question, care must also be taken to ensure a transparent description of the chosen methodological approach when triangulating data and methods.This is particularly important because there may be variations in the results across the different methods, which can be a challenge in applying the findings in the policy process.However, as divergent results can lead to new and better explanations for the phenomenon under investigation ( Tashakkori and Teddle 2003 ) , they should be presented and discussed transparently.Moreover, combining methods can allow revealing insights into both aggregate effects as well as the underlying mechanisms.
In addition to combining qualitative and quantitative methods, ex-post and ex-ante methodological approaches should more often be combined for developing and adapting policy instruments and thus support policymaking ( Finger et al. 2017 ) .This combination is particularly relevant in evaluations of the adoption of new sustainable production methods in agriculture, which often take place in contexts of limited data and high uncertainty ( Möhring et al. 2022 ) .Different tools and data should be combined for holistic impact evaluations, for instance by integrating data from life-cycle assessment ( Gaillard and Nemecek 2009 ) , other sustainability assessment tools such as SMART ( Schader et al. 2016( Schader et al. , 2019 ) ) or TAPE ( Mottet et al. 2020 ) or the use of environmental monitoring data ( Gilgen et al. 2023 a ) . 5 These strategies will allow the capture of trade-offs and synergies between a range of sustainability outcomes.
Filling gaps in the understanding of farmers' behaviour is also important in order to be able to include this information in, for example, model-based ex-ante policy evaluations.The combination of behavioural factors with bio-economic and agent-based modelling of agricultural production allows for the analysis of potential effects of policy measures on farmers' behaviour and the resulting policy outcomes and impacts ( Reidsma et al. 2018 ;Huber et al. 2021 ) .

Strengthen capacities for evidence-based policy and professional practices
Strengthening the science-policy interface is required, and an actor-centred approach is necessary to make scientific evidence generate real-world impacts and inform policymaking ( see Fig. 1 ) .
Researchers can apply different methods to promote the use of scientific evidence in policy and practice and inform decision-makers ( Hofmann et al. 2022 ) .For instance, where truth-seeking actors are constrained by evidence gaps, the generation and accumulation of evidence can be improved through more interdisciplinary collaboration, knowledge networking, and syntheses of current knowledge ( Topping et al. 2020 ) .For meaning-oriented actors, evidence can be made more relevant by increasingly co-producing knowledge in transdisciplinary projects between interconnected actors from science, policy, and practice, also taking into account experiential knowledge ( Norström et al. 2020 ) .In the case of benefit-maximizing actors, transparency requirements can limit the strategic or even misuse of evidence ( Rohr 2021 ) .The publication of all data collected with public funds can prevent the existence of private data monopolies.
The methodological approach and stakeholder interaction must also be aligned with the phase in the policy cycle.For instance, in the preparation phase, a theory of change should be formulated to clarify intended and possible unintended consequences and better contextualize the findings.Existing knowledge from past evaluations and analyses should also be increasingly used systematically, thereby avoiding duplication and allowing a more efficient use of the limited financial and human resources in research and in the political process.For example, a meta-analysis or systematic review can produce very good results without the need to adapt and extend models, which may be subject to uncertainties, especially when new policy instruments are evaluated.Formulating the theory of change should involve different stakeholders ( such as practitioners, interest groups, and civil society ) so that a comprehensive understanding of the expected direct and indirect impacts of a policy measure can be formed and to ensure the greatest possible benefits of policy changes.In general, as Pe'er et al. ( 2019 : 451 ) note, 'monitoring and implementation processes should engage farmers, scientists, and citizens to better evaluate the impacts of interventions, to ensure delivery, and to promote societal inclusion, innovation, and adaptation management'.
During the adoption and implementation phases ( and besides ex-ante modelling exercises ) , the joint design of economic experiments for ex-ante impact evaluations can be valuable to adapt and refine policy instruments.In this context, we should consider that this process takes time and that this time requirement must be explicitly integrated into the policy cycle.Assessments made in real-world policy advice settings often have limited or even no time for scientifically sound analyses.Ad hoc evaluation mandates frequently prevent a systematic and rigorous scientific approach, with corresponding implications for the significance of the results.Instead, we need forward-looking and future-oriented research that anticipates future developments and develops proposals for the attention of political stakeholders, who in turn can consolidate this scientific evidence in a systematic process with other stakeholders and finalize them for implementation.A critical reflection is also needed how and by whom the transfer of scientific knowledge takes place in governmental and policy processes, e.g.ministries, political parties, or entities at the science-policy interface such as think tanks, and what useful interaction and evidence means for these different actors.Moreover, science needs a careful reflection of its role in the political process.For example, science as an honest knowledge broker provides evidence on advantages and disadvantages of possible policy choices while acknowledging that the actual policy decisions are made by policy makers and not by scientists themselves.

Analyse data from different sources with adequate methods
The availability of farm-level data is a key aspect for understanding the inter-linkages between agricultural policies, farmers' decision-making, and natural production conditions, and thus is a key for designing and refining agricultural and food policies based on rigorous ex-ante and ex-post evaluations.
In general, farm-level monitoring tools should allow stakeholders to empirically document trends in a way in which developments can be attributed to the relevant policies and separated from other influences ( Poppe et al. 2016 ) .But while the economic function of agriculture has always been an important objective of agricultural policies that aim to ensure a sufficient food supply and an adequate income for farmers ( Finger and El Benni 2021 ) , the environmental, animal welfare, and social dimensions are underrepresented in current indicator sets and monitoring systems and should be expanded.Data collections such as FADN must consider the changing data needs driven by changes of agricultural-policy goals and instruments.
There is hope that these data gaps can be closed in the future, as various emerging data sources are currently under-used.These sources comprise geospatial and remote-sensing data, such as on land use and land-use intensity ( Ehlers et al. 2021 ;Wuepper and Finger 2022 ) ; data from sensors on machinery, such as on yields and inputs usage ( Finger et al. 2019 ) ; and data from online sources extracted by web scraping ( Hillen 2019 ) , including data from social media and Google Trends, such as on societal concerns with agriculture ( Schaub et al. 2020 ) .To leverage data opportunities, a push towards open research data as well as open governmental data is required.With the dramatic increase in data availability, machine-learning approaches have also become more important for agricultural economists by helping to exploit large volumes of data more efficiently than traditional statistical methods allow ( Storm et al. 2020 ) .In policy evaluations, skills and capabilities are frequently neglected intermediate outcomes ( O'Donovan et al. 2022 ) .To assess these outcomes as part of policy-impact studies, indicators should be developed that can capture hard and soft skill development.The emerging field of behavioural experiments can contribute to filling this gap.

Improving the transparency and understanding of how political decisions are made
A major challenge for science to contribute to policymaking is the political process.Even when scientific evidence is available, it does not necessarily inform policy decisions.This is because strong political pressure can maintain the status quo and block policy changes ( Swinnen 2018 ) .For instance, by providing more and better evidence, science can support truth-seeking and sense-making actors.However, utility-maximizing actors will only use scientific evidence if it supports their own concerns ( Hofmann et al. 2022 ) .Even if researchers can hardly influence actors to use scientific evidence, they can scientifically accompany the policy-making process and create transparency about how decisions are made.The involvement of different actors throughout the policy cycle and the co-production of knowledge can lead to the actors recognizing the mutual benefits of the policy changes being discussed.This enables the negotiation of agreements that lead to policy changes.As the negotiation of agreements is a promising path to policy change ( Metz et al. 2021 ) , the scientific analysis of the policy process is also an interesting research topic regarding agricultural policy.

Papers in this special issue
Based on this background, the papers in this special issue present a range of methods and data used for ex-ante and ex-post analyses of agricultural and food policies.The papers will add to the literature by ( 1) showing how different data sources and indicators can be used to analyse agricultural and food policies and ( 2 ) how ex-post and ex-ante assessment methods can contribute to the understanding, design, and refinement of agricultural and food policies.
In a behavioural experiment in Germany, Rommel et al. ( 2023 ) show that the willingness of farmers to cooperate in the collective implementation of agri-environmental measures was higher than experts expected.Similar studies are to be conducted in the Netherlands, Hungary, and Poland, thus creating broad-based knowledge in different contexts on how measures need to be designed in order to effectively and efficiently achieve the environmental goals of agricultural policy.
Using a quasi-experimental approach with a spatial-regression-discontinuity design, Zimmert and Zorn ( 2023 ) show that direct payments increase family farm employment in Switzerland.The analysis points to not only economic but also social side-effects of the current direct-payment system because the additional labour force often consists of nonsalaried female household members.Without a wage, these family members are insufficiently protected socially, an issue that should gain importance in discussions of the further development of agricultural policy.Fedoseeva and Irek ( 2023 ) show the added value of using online price data to obtain spatial and temporal market information on, in this case, food prices.In some countries, automated surveys of online prices are already used to measure inflation, for example, in Switzerland for books and clothing.Compared to the classic consumer price indices, which are usually calculated on a monthly basis, the advantage of this approach is that data are available in real time, and price changes are recorded in a more differentiated manner by product, location, and sales channel due to the large database ( Cavallo and Rigobon 2016 ) .In exceptionally dynamic times caused by pandemics or war, such real-time evidence can be valuable for policymakers.
Combining an ex-ante bio-economic modelling analysis with an ex-post econometric analysis based on survey data, Möhring et al. ( 2023 ) show that this combination has led to a better understanding of the adoption decisions of different types of farmers and provides stakeholders with precise information on the design and implementation of policy measures geared towards more sustainable production.They also show that the value of the different analyses differs, depending on the project stage and the round of the policy cycle, and that the choice of the model type and aggregation level in the ex-ante assessment is crucial to generating synergies in later policy cycles.
In an Italian case study, Santeramo et al. ( 2023 ) investigate whether and to what extent there are inter-linkages between a public policy reform ( namely changes in the subsidization of insurance contracts ) and a private-sector reform ( changes in the types of insurance contracts ) .Even if no causal link between farmers' behavioural changes and a specific policy could be demonstrated, this study exemplifies the challenges of evaluating policies when behavioural changes are the result of different policy and market changes.Providing contextual information in such studies is particularly important in order to yield potentially generalizable information for policymakers, for example, in meta-analyses.Bystricky et al.'s ( 2023 ) paper shows how model-based ex-ante evaluations can contribute to the design of agricultural policy measures; the authors use as an example a currently pending revision of a voluntary agri-environmental programme in Switzerland.Applying the aforementioned ABM SWISSland, supplemented with data from life-cycle assessment and further environmental indicators, the authors evaluate the direct and indirect effects at the national scale of a reduction in protein-reduced-concentrate use in roughage-based dairy and meat production.Since the evaluation shows that feed competition will be exacerbated by the proposed design of the measures and that no improvement in environmental impacts is to be expected, the measure will not be implemented as planned, but further design options will be examined.
In a Swiss case study commissioned by Switzerland's Federal Office for Agriculture, Gilgen et al. ( 2023b ) show how environmental indicators can be used for the development of indicator-based agri-environmental payment programmes that better take into account environmentally relevant farm structures, as compared to current agri-environmental programmes.The authors analyse the effectiveness of the proposed indicator-based policy using SWISSland, but their results show hardly any effect on the environmental performance of the sector due to the manifold interactions with other existing direct-payment programmes.( Gocht and Britz 2011 ;Gocht et al. 2013 ) and Agriculture, Recomposition de l'Offre et Politique Agricole, or AROPAj ( De Cara and Jayet 2011 ) . 4 Other experimental setups are available to researchers who wish to learn about the reasons for behavioural change.Choice experiments, for example, provide relatively cheap and rapidly obtained information on potential ways to improve a policy ( Colen et al. 2016 ) , with the trade-off that internal validity ( i.e. the isolation of the causal relationship ) is more challenging to maintain ( Harrison and List 2004 ) . 5 SMART and TAPE refer to 'Sustainability Monitoring and Assessment Routine' and 'Tool for Agroecology Performance Evaluation', respectively.

Figure 1 .
Figure 1.The role of research in policymaking.Phases of the policy cycle ( it is also included in the uploaded document ) : policy design and preparation, adoption, implementation ( transposition, complementary non-regulatory actions ) , application ( including monitoring and enforcement ) , evaluation and revision ( EC, 2021a ) .