There is copious literature on development and validation of models to forecast risk to crops from arthropods and diseases; however, there is little published on causes of failure associated with these models. This manuscript provides mechanistic model builders and users with a list of likely problems, potential causes, possible solutions, and associated references. The problems are divided into four categories: environmental inputs, model construction and parameterization, validation, and implementation. The list is based on the authors’ extensive experiences developing and running mechanistic modeling systems. A multidisciplinary approach involving researchers with expertise in pest biology, crop management, meteorology, and information technology is recommended for delivering the most effective pest forecast models.
Weather-based forecast models are commonly used for predicting risk to crops from arthropods and diseases (Magarey et al. 2002). The forecasts in turn are used for scheduling management operations including scouting and pesticide treatments (Isard et al. 2015). There is copious literature on development and validation of models used to forecast risk to crops from arthropods and diseases (hereafter called pest models; Welch et al. 1981, De Wolf and Isard 2007); however, there is little literature on the causes of failure associated with these models. One reason for this dearth is that science seldom publishes negative results (Fanelli 2011). In addition, failures are rarely reported because researchers fear being perceived by the scientific community as inexperienced. This manuscript attempts to reduce the incidence of plant pest forecast model failure by providing novice model builders and users with a list of likely problems, potential causes, possible solutions, and, where available, associated references (Box 1). Although providing context for common problems, it is not intended as a review of mechanistic plant pest models. Instead the manuscript focuses on troubleshooting mechanistic models that simulate pest development. The list of problems examined is not exhaustive and is based on the authors’ extensive experiences developing and running mechanistic modeling systems such as the North Carolina State University/APHIS Plant Pest Forecast System (Magarey et al. 2015), the Generic Pest Forecast System (Hong et al. 2015), Integrated Pest Management Pest Information Platform for Extension and Education (Isard et al. 2006, VanKirk et al. 2012), and Integrated Pest Information Platform for Extension and Education (iPiPE; Isard et al. 2015). Many of the factors that complicate mechanistic pest modeling are also common to other types of pest prediction models. These problems are divided into four categories: environmental inputs, model construction and parameterization, validation, and implementation. For the purposes of this manuscript, environmental inputs refer to the weather data required to build and run pest forecast models. Construction involves the process of designing or modifying a model framework to address a specific pest management need. Parameterization is the process of selecting parameter values to customize a model for a specific pest, host, and location. Validation is the process of confirming that a model is performing well for the target pest. Finally, implementation refers to the process of using the model in an operational setting to meet management needs.
There are two general types of modeling approaches that are commonly used for pest forecasting: mechanistic and statistical. The mechanistic approach tracks pest dynamics in a process model that simulates the development of the target organism and perhaps its host(s). Empirical models quantify relationships between pest intensity and impacts as observed in the field and weather data, and usually require multiple years of observational data for development. Empirical pest forecasting models are not addressed in this manuscript. Madden et al. (2007) provide a comprehensive reference for those interested in developing statistical models drawing on numerous examples. The Fusarium head blight model, developed from 50 locations-years using temperature, relative humidity, and rainfall as predictor variables, is an excellent example of a statistically based operational disease forecasting system (http://www.wheatscab.psu.edu/, accessed 19 October 2016) (De Wolf et al. 2003). The Tomato Spotted Wilt Virus and thrips vector model (Chappell et al. 2013) is also an excellent example of a statistically based pest forecasting system.
There are three main options for obtaining weather data to build and run pest models (Magarey et al. 2001, Gleason et al. 2008). The first is to use on-site agricultural weather stations, the second is to employ data from a weather station network, and the third is to use simulated weather data from a numerical model. Each option has advantages and disadvantages. An on-site agricultural weather station can provide data that are specific to the location of interest; however, meteorological instruments require frequent maintenance and calibration and may provide poor quality data if not frequently and properly serviced. A second option is to use data from a public or private weather station network, freeing the user from the burden of maintaining the meteorological instrumentation. Examples of weather station networks include the Cornell University Network for Environmental and Weather Applications (http://newa.cornell.edu/, accessed 19 October 2016) and Washington State University AgWeatherNET (http://weather.wsu.edu, accessed 19 October 2016). Employing weather station network data to forecast pest risk is only viable where a network weather station is close by and thus its measurements represent well the environmental conditions at the location of interest. Where distance to the nearest weather station is large, spatial interpolation procedures can be used to create data that better represent the target location within the networked area (Barnes 1964, Splitt and Horell 1998, Racca et al. 2010). A third option, that is becoming more prevalent, is to use simulated data from numerical weather models for the location of interest (Russo 2000, Magarey et al. 2001). Numerical weather models combine a variety of weather data sources including remote sensing, radar, and weather stations to make gridded weather products. Importantly, numerical weather model data have several advantages including low cost and availability for any location. These models provide data averaged for grid cells of varying dimensions. Grid cell coarseness may reduce the representativeness of simulated data for use at specific locations, especially in physically heterogeneous landscapes. In the cases where the grid is too coarse, models can be downscaled using simple techniques based on lapse rates for temperature (∼0.6°C/100 m) and dew point (∼0.2°C/100 m) temperature or preferably more complex numerical techniques that rely upon variable lapse rates (Royer et al. 1989, Benjamin et al. 2007). The quality and spatial resolution of numerical grid weather data is continually improving. For example, the Real Time Mesoscale Analysis (RTMA) data (http://www.nco.ncep.noaa.gov/pmb/products/rtma/, accessed 19 October 2016) used for pest modeling in the iPiPE (Isard et al. 2015) now has a resolution of 2.5 km compared with 5 km when it was released in 2007 (Benjamin et al. 2007).
A diligent user can employ a combination of these options; for example, a rain gauge and simple weather station is an inexpensive way to supplement data from a weather station network or numerical weather model. Regardless of the source of weather data, users should be cognizant of the most common errors that impact their quality. Weather data errors represent the most common cause of arthropod and disease forecast model failure because of the complexity of representing the plant microclimate, which is highly variable in space and time.
Leaf wetness is often an especially important variable for crop disease forecasting because many foliar pathogens are only active when plant surfaces are wet. However, obtaining useful leaf wetness data can be problematic because these measurements are typically not included in weather station networks; e.g., Florida Automated Weather Network (fawn.ifas.ufl.edu) and accurate simulation of leaf wetness caused by dew requires information on canopy type, canopy closure, height in canopy, and local topography in addition to meteorological conditions. Procedures for measuring leaf wetness are presented by a number of authors (Sentelhas et al. 2004, Magarey et al. 2005b, Gleason et al. 2008). Models available for simulating leaf wetness include methods based on relative humidity (Sentelhas et al. 2008), fuzzy logic (Kim et al. 2002), nonparametric classification (Gleason et al. 1994), and a physically based approach that combines a surface water budget with an energy balance (Magarey et al. 2006). Some of these models can be calibrated for a crop using canopy parameters, such as height and leaf area index, and driven by gridded numerical weather model data. Regardless of the approach used to obtain leaf wetness, it is important to compare measurements and simulated data with visual observations to confirm their accuracy.
Model Construction and Parameterization
Mechanistic pest forecast models require detailed knowledge of the rate of organismic growth and development with respect to key environmental and host variables for construction and parameterization. For arthropod pests, the most commonly used mechanistic approach is the degree-day accumulation model (Herms 2004). Simple degree-day models generally track the development through stages of the average member of the population. Other more complex approaches simulate the proportion of the population in each of the arthropod’s phenological stages using either density functions or demographic models. A variety of density functions can be used to convert a degree-day accumulation into the proportion of the population in each stage (Welch et al. 1978). They include Weibull (Wagner et al. 1984), Gaussian (Kim et al. 2010), and logistic (Damos and Savopoulou-Soultani 2010). Demographic models estimate the proportion of insects in each stage based on development rate and factors such as mortality, life span, and fecundity (Yonow et al. 2004, Gutierrez et al. 2010, Fand et al. 2014, Tochen et al. 2014). This approach requires additional data for parameterization and more time for model development. Models that simulate phenological development, although potentially more accurate than simple degree-day models, are rarely deployed operationally. The recent development of the Insect Lifecycle Modeling (ILCYM) software (Sporleder et al. 2009), an open source generic modeling tool that incorporates temperature driven processes, may remove some of the barriers to mechanistic phenological model development.
One of the most common mechanistic approaches to plant disease forecasting is the infection model. Two useful introductory references to infection modeling are Madden and Ellis (1988) and Magarey and Sutton (2007), with the former providing a comprehensive review of disease forecasting. Simple infection modeling approaches capture the interaction of moisture and temperature variables to estimate infection periods while more complex procedures include processes such as sporulation, spore maturation, spore release, and spore dispersal. The interaction of moisture and temperature variables has been captured using a number of procedures. For example, a Wallin type matrix has been employed to forecast potato late blight (Krause and Massie 1975) and tomato diseases (Pitblado 1992). Other examples include a generic temperature–moisture response function parameterized from controlled laboratory experiments (Magarey et al. 2005a, Magarey and Sutton 2007), a simple approach based on degree-hour wetness (Pfender 2003) and an empirically estimated temperature–moisture equation (Lalancette et al. 1988, Madden and Ellis 1988). Examples of more complex plant disease simulators include those developed for apple scab (Rossi et al. 2007), grape downy mildew (Magarey et al. 1991), and for assessing the impacts of climate change on plant disease (Bregaglio and Donatelli 2015).
Where data on factors that govern pest development are available from controlled experiments, the mechanistic approach to model construction and parameterization does not require an extensive observational database. However, where data for parameterization is very limited, it may be necessary to use observational data on a closely related species as a starting point. Databases and tables are available containing development requirements for arthropods (Nietschke et al. 2007, Jarosik et al. 2011) and infection requirements for fungal pathogens (Magarey et al. 2005a). In cases where field observation data are abundant, a commonly used protocol for model parameterization is to divide the observed pest data sets into parameterization (calibration) and validation subsets (Bregaglio et al. 2016). A sensitivity analysis may help determine the influence of each parameter and consequently which of them should receive the most careful attention (Berger et al. 1995).
Model construction and parameterization problems include availability of input data, computational complexity, missing variables, and poor quality data. Models may require environmental inputs that are not readily available to potential users (Magarey et al. 2002). Models may be difficult to program and computationally intensive. Failure may occur if the model is missing an important variable. For example, a model may forecast infection periods but not consider inoculum availability or phenological susceptibility, which may have a greater impact on model performance once a minimum threshold for infection has been exceeded. An instructive example is the expert system approach by Travis et al. (1992), which combines the insect or disease forecasts with other risk factors such as phenological susceptibility, cultivar resistance, and the time since the last pesticide application. Poor model parameterization can result from improperly controlled experimental conditions and incorrect measurement of field conditions. Models also suffer when parameters are estimated from experiments that do not well represent the agronomic situation in which the model is deployed. For example, the parameters associated with pest–host interactions in a forecast model may be incorrect if the biotype of the pests used in lab experiments differ from those present in the field.
A mechanistic model should be validated before implementation in an operational setting. This step ensures that the model is correctly designed and parameterized. Bellocchi et al. (2010) provide a comprehensive review of validation processes for biophysical models. Racca et al. (2010) address validation issues with case studies related to pest models that predict: 1) the first appearance of disease; 2) pest incidence, severity, or abundance; 3) action thresholds; and 4) crop growth stages and pest intensity. Pest models used for operational forecasting for management have different criteria for validation than those used for research purposes (Welch et al. 1981). The former require the estimation of risk associated with model use, whereas statistical hypothesis testing procedures are generally employed for validating research models. From the pest manager’s perspective, a model is valid if the risk of failure is less than some maximum acceptable level. Thus, one of the challenges for model validation is developing criteria for defining model success and failure. In developing criteria, both model outputs and observations, such as pest numbers, disease severity or incidence, and first observation of the phenological stage or pest symptom, need to be related to the management decision. These criteria may need to include what Welch et al. (1981) call an indifference band, i.e., +/− uncertainty interval around the observed or truth value to indicate model errors that are too small to impact the management decision. In order to compare model outputs and pest observations statistically, it is necessary to put them in comparable units. For pest counts, one method is to standardize both observations and model output by their maximum values creating an index between 0 and 1. In the case where the distribution is skewed by large values, use the 95th or 99th percentile and set the maximum value as 1.
For some diseases, validation may be challenging because disease symptoms are difficult to observe in the field perhaps due to the cryptic nature of the disease, a long latent period, or an infection that occurs on roots in the soil. Validation may be conducted by comparing treatments made according to a calendar spray program with those scheduled by a forecast model (Grünwald et al. 2002). The availability of historical grid weather data now allows mechanistic models to be easily run for pest observations from the literature providing opportunities to verify pest models without the expense of conducting new field experiments. However, the process of model validation can be sabotaged by a lack of standardization among pest observations and errors in transcribing data. The collection of pest observation data sets with low temporal resolution (e.g., observations collected by scouts for pest management) but high spatial resolution (e.g., multiple observations across a wide geographic area) from scouting programs is an important validation method (Welch et al. 1981). Projects such as the Integrated Pest Information Platform for Extension and Education (Isard et al. 2015) and EDDMapS (Snyder 2015) provide platforms for pest data sharing and have potential to make these kinds of pest observation data sets more available to researchers. Although pest model validation is commonly done by the model developer, there is benefit in a more formal validation process with independent testing (Paez 2009).
Pest model outputs are normally delivered through a Decision Support System (DSS). Decision Support Systems can range from the very simple, such as a table, to a complex expert system (Magarey et al. 2002). The ultimate test of a DSS is the extent to which its outputs are employed by agricultural practitioners in their crop management decision making. There are a number of excellent examples of successful DSS implementations, including strawberry disease forecasting (Pavan et al. 2011), tomato diseases (Gleason et al. 1995), potato late blight (Racca et al. 2010), apple arthropods and diseases (Jones et al. 2010), pests on horticultural crops (Finch et al. 1996), and arthropods and diseases on multiple crops (Russo 2000). There are many barriers that can prevent the adoption of DSS by users, including the time required to interact with the system and the ease of access on a user’s smart device (Magarey et al. 2002). Decision Support System outputs must also be expressed in a language users can easily understand and must correspond with existing, cost-effective management options. Controlling plant pests is just one of the many management considerations for stakeholders. Consequently, determining the usefulness of model outputs requires interaction with pest managers and stakeholders to understand their responsibilities, decision schedule, information needs, and available management options. McRoberts et al. (2011) suggests weighting the skill associated with the forecasted probability of disease (or pest above economic impact threshold) occurrence against the expected regret (i.e., the expected impacts of an incorrect forecast). The authors propose an interdisciplinary research framework that could allow epidemiologists, social scientists, economists, and risk analysts to identify barriers that prevent many users from adopting DSS.
The frequency of pest forecast model failure is not well documented, as scientists generally report success stories and rarely failures. Mechanistic models can fail in construction and parameterization, validation, and implementation phases. Common symptoms of forecast model failure that we have experienced are organized by phase in Box 1. In general, symptoms that are expressed in the first phase of the modeling process involve environmental inputs and are relatively easy to troubleshoot. In the process of developing a new mechanistic pest forecast model, consider beginning at the bottom of the table focusing on the management actions that will be impacted by the model output. The bottom-up approach should decrease the frequency of failures that result from the development of models that produce outputs that are not useful to stakeholders or are useful but not used by them. A tension that exists in the model development process is that pest risk modelers are often trained as plant pathologists, entomologists, and weed scientists, with a tendency to focus on pest and host biology. Likewise, meteorologists and other environmental scientists tend to focus on the aspects of pest risk modeling associated with the collection of weather data and controlled or field experiments to parameterize and validate models. The successful provision of pest risk forecasts is inherently complex. Clearly, a collaborative, multidisciplinary approach involving researchers with expertise in pest biology, crop management, meteorology, and information technology is appropriate. Modeling projects that address implementation with input from potential users are far more likely to be successful than approaches that consider model implementation as an afterthought.
Model Construction and Parameterization
Symptom: Model well represents observations for most locations and times but fails occasionally at a few locations.
Possible cause: Weather data from anomalous locations has incorrect (suspect or missing) values.
Potential solution: Check/implement quality assurance procedure: calibrate/maintain weather station instrumentation (Shafer et al. 2000).
Symptom: Model well represents observations for most locations and times but fails frequently at a few locations.
Possible cause: Gridded weather data may be incorrect for anomalous location.
Potential solution: Confirm latitude and longitude using a GIS viewer such as Google Earth; check for latitude and longitude transposition and correct ± sign; review procedure for collecting and georeferencing data (Chapman and Wieczorek 2006).
Possible cause: Resolution of gridded data is too coarse.
Potential solution: Correct weather variables for differences between mean elevation of grid cell and elevation of location (Hong et al. 2015); use an algorithm that downscales to a finer resolution (Royer et al. 1989).
Symptom: Model fails to well represent observations for most locations and times.
Possible cause: Units of weather data do not match those required by model.
Potential solution: Check units of weather data.
Possible cause: Weather data (especially relative humidity and/or leaf wetness) do not well represent canopy environment.
Possible cause: Important agronomic factor(s) and/or environmental variables are not included in model.
Potential solution: Consult with subject matter experts; conduct research into pest and cropping system to determine if there are other important factors that should be included in the model.
Possible cause: Model coding error.
Potential solution: Check equations and units of parameters and variables; obtain input and output files from model developer and run the model to verify that the model is correctly coded.
Possible cause: Model parameters do not well represent the pest and cropping system.
Potential solution: Re-parameterize model using additional data on pest or closely related species; conduct additional research.
Symptom: Model fails testing at some or all locations.
Possible cause: Errors transcribing validation data.
Potential solution: Check pest observation data.
Possible cause: Errors in converting field observations to units comparable to model outputs.
Potential solution: Check method for standardizing data.
Symptom: Model fails to well represent observations for most locations and times.
Possible cause: Quality of weather data is less than that used for model development and validation.
Symptom: Model developed using data from a set of locations and times fails to well represent observations for study sites.
Possible cause: Values of inputs for study sites exceed range for which model was constructed; Pest biology and/or agronomic conditions at model development locations and study locations differ.
Potential solution: Consult with model developers, identify an alternative model, reparametrize model using both development and validation data sets and find new validation data.
Symptom: Validated model output does not well represent stakeholder field observations.
Possible cause: Research observation used for validation are not representative of pest observations from stakeholder’s fields.
Potential solution: Re-parameterize and validate model from pest observations collected in stakeholder’s fields.
Symptom: Model outputs not useful to stakeholders.
Possible cause: Model does not supply information that is critical to management decision.
Potential solutions: Redesign model outputs to better address management options.
Possible cause: Model output may not allow time for management response.
Symptom: Stakeholders do not use model outputs.
Possible cause: Using the DSS is too time consuming.
Potential solution: Streamline user/DSS interaction including estimating some parameters without need for user input.
Possible cause: Model outputs are confusing to stakeholders.
Potential solution: Redesign DSS outputs based on feedback from stakeholders using easier to understand formats.
Possible cause: DSS system is not compatible with technologies used by stakeholders.
Potential solution: Enable DSS outputs to be delivered to user’s smart devices (Baylis 2014).
We wish to acknowledge Joe Russo, ZedX, Inc., for valuable insights on pest modeling and weather data. This work was supported by the United States Department of Agriculture-National Institute of Food and Agriculture-Agricultural Food and Research Initiative (USDA-NIFA AFRI) Competitive Grants Program Food Security Challenge Area Grant 2015-68004-23179.