- Split View
-
Views
-
Cite
Cite
Nokome Bentley, Terese H. Kendrick, Paul J. Starr, Paul A. Breen, Influence plots and metrics: tools for better understanding fisheries catch-per-unit-effort standardizations, ICES Journal of Marine Science, Volume 69, Issue 1, January 2012, Pages 84–88, https://doi.org/10.1093/icesjms/fsr174
- Share Icon Share
Abstract
Standardization of catch per unit effort using generalized linear models (GLMs) is a common procedure that attempts to remove the confounding effects of variables other than abundance. Simple plots and metrics are described to assist understanding the standardization effects of explanatory variables included in GLMs, illustrated with an example based on New Zealand trevally (Caranx lutescens) data.Bentley, N., Kendrick, T. H., Starr, P. J., and Breen, P. A. 2012. Influence plots and metrics: tools for better understanding fisheries catch-per-unit-effort standardizations. – ICES Journal of Marine Science, 69: 84–88.
Introduction
For many fisheries, catch per unit effort (cpue) is a key index of abundance; for some, it is the only index. However, using cpue as an index of abundance is notoriously problematic (Hilborn and Walters, 1992; Maunder et al., 2006). For instance, cpue may be “hyperstable” or insensitive to changes in abundance (Harley et al., 2001). Changes in fishing pattern over time can also cause distortions in the relationship between cpue and abundance (Bishop, 2006; Ye and Dennis, 2009). The fleet may change in vessel composition or may change its main fishing season or areas, causing changes to cpue that are independent of stock abundance.
Standardizing cpue with generalized linear models (GLMs) provides one way to remove some of these effects; for overviews of this approach, see Maunder and Punt (2004) and Venables and Dichmont (2004). Cpue can also be standardized by estimating regression coefficients along with other assessment model parameters in an integrated stock assessment model that uses other fishery and biological data (Maunder, 2001; Maunder and Langley, 2004). However, the complexity of many age-structured and length-based assessment models leads to a segregated approach, with cpue standardized outside the model using a GLM.
Cpue standardization using GLMs is not an attempt to build a predictive model for forecasting or to explain variance in a dataset. Rather, the GLM is used in an attempt to remove the confounding effects of extraneous variables, resulting in an index that is as representative as possible of the vulnerable stock biomass. There is therefore a need to explore the results, rather than simply accepting the cpue indices arising from a GLM, and to understand the standardization effects achieved by including each of the explanatory variables in the model.
Cpue standardization with modern statistical packages frequently involves many variables. Even when stepwise selection methods are used, it is common for many explanatory variables to be statistically significant because of the large size of the catch-and-effort datasets (Maunder and Punt, 2004; Bishop et al., 2008). Consequently, when standardized indices have a pattern that differs from that of the unstandardized cpue, the reason can be difficult to understand: what effect, for instance, does each of the explanatory variables have on the resulting indices of abundance?
Here, we suggest simple visualizations and metrics that are useful for both understanding how a GLM has removed the confounding effects of explanatory variables and for conveying the results of cpue standardizations to stakeholders.
Methods and example application
The methods are illustrated using an example of cpue standardization drawn from Kendrick and Bentley (2010), in which standardized cpue was developed for bottom trawling in New Zealand's trevally (Caranx lutescens) stock TRE 7. The reader is referred to Kendrick and Bentley (2010) for descriptions of data grooming, defining the fishery, defining core vessels, and other detail. The defined core-vessel fleet was 30% of the total fleet, but it caught 80% of the reported catch. Data were aggregated by vessel, trip, statistical area, fishing method, and target species to produce individual records for the GLM.
Model terms in Equation (1) are listed in the order that they entered the GLM during stepwise selection. Table 1 provides the usual GLM summary statistics associated with each explanatory term: degrees of freedom, the increase in the deviance explained, and the AIC of the model after the term is included. Based on these statistics, log(tows) had the greatest explanatory power, followed by month and vessel.
Term . | Degrees of freedom . | Explanatory power . | |||
---|---|---|---|---|---|
Increase in deviance explained . | r2 (%) . | AIC . | Overall influence (%) . | ||
Year | 18 | 1 306.4 | 3.8 | 40 270 | – |
Log(tows) | 1 | 7 773.2 | 22.9 | 37 553 | 10.8 |
Month | 11 | 3 649.0 | 10.7 | 35 989 | 7.5 |
Vessel | 26 | 2 924.2 | 8.6 | 34 559 | 19.8 |
Target | 1 | 1 267.7 | 3.7 | 33 843 | 10.3 |
Area | 5 | 632.3 | 1.9 | 33 475 | 7.7 |
Term . | Degrees of freedom . | Explanatory power . | |||
---|---|---|---|---|---|
Increase in deviance explained . | r2 (%) . | AIC . | Overall influence (%) . | ||
Year | 18 | 1 306.4 | 3.8 | 40 270 | – |
Log(tows) | 1 | 7 773.2 | 22.9 | 37 553 | 10.8 |
Month | 11 | 3 649.0 | 10.7 | 35 989 | 7.5 |
Vessel | 26 | 2 924.2 | 8.6 | 34 559 | 19.8 |
Target | 1 | 1 267.7 | 3.7 | 33 843 | 10.3 |
Area | 5 | 632.3 | 1.9 | 33 475 | 7.7 |
AIC, Akaike Information Criterion; r2, proportion of deviance explained. The overall influence metric is explained in text.
Term . | Degrees of freedom . | Explanatory power . | |||
---|---|---|---|---|---|
Increase in deviance explained . | r2 (%) . | AIC . | Overall influence (%) . | ||
Year | 18 | 1 306.4 | 3.8 | 40 270 | – |
Log(tows) | 1 | 7 773.2 | 22.9 | 37 553 | 10.8 |
Month | 11 | 3 649.0 | 10.7 | 35 989 | 7.5 |
Vessel | 26 | 2 924.2 | 8.6 | 34 559 | 19.8 |
Target | 1 | 1 267.7 | 3.7 | 33 843 | 10.3 |
Area | 5 | 632.3 | 1.9 | 33 475 | 7.7 |
Term . | Degrees of freedom . | Explanatory power . | |||
---|---|---|---|---|---|
Increase in deviance explained . | r2 (%) . | AIC . | Overall influence (%) . | ||
Year | 18 | 1 306.4 | 3.8 | 40 270 | – |
Log(tows) | 1 | 7 773.2 | 22.9 | 37 553 | 10.8 |
Month | 11 | 3 649.0 | 10.7 | 35 989 | 7.5 |
Vessel | 26 | 2 924.2 | 8.6 | 34 559 | 19.8 |
Target | 1 | 1 267.7 | 3.7 | 33 843 | 10.3 |
Area | 5 | 632.3 | 1.9 | 33 475 | 7.7 |
AIC, Akaike Information Criterion; r2, proportion of deviance explained. The overall influence metric is explained in text.
In this example, standardized cpue differed dramatically from unstandardized cpue (Figure 1), unstandardized cpue exhibited an increasing trend, with a rapid increase in 2006, whereas standardized cpue underwent an initial decrease for 3 years before flattening out. It is obviously of some importance to understand how the explanatory variables caused the difference between these two cpue trends.
A simple way to explore the effects of explanatory variables, described by Bishop et al. (2008), is to plot the year indices that result as each explanatory variable is added to the model to see how they change (Figure 2a). Although log(tows) has high explanatory power, its addition to the model causes only a slight change in annual cpue indices. In contrast, there is a large change in the indices when vessel coefficients are introduced. This “step plot” suggests that the vessel variable has a major influence on standardized cpue, i.e. that changes in relative effort among vessels have a large influence on the pattern of unstandardized cpue independent of abundance.
Here, we suggest a way of quantifying the “influence” that each explanatory variable has on the unstandardized cpue in each year. The metrics presented provide a measure of the contribution of each explanatory variable to the difference between standardized and unstandardized annual year effects. Step plots provide some indications, but they show only incremental changes in the cpue index rather than the relative influence of each explanatory variable in the final model.
A measure of the influence of an explanatory variable can be derived from the GLM coefficients associated with that variable. Variables with coefficients of high magnitude do not necessarily have great influence. For a variable to have great influence, there must be changes in the relative distribution of that variable among years. When there are changes in the distribution of a variable, there are also changes in the coefficients associated with the variable. In a year when the values of the variable differ from their average, the variable will have greater influence on the difference between unstandardized and standardized cpue.
Figure 2b shows the annual influence for each explanatory variable in the final standardization model. The annual influence of the vessel effect increases substantially from 1990 to 2008, mirroring the change in the cpue index seen in the step plot after “vessel” is introduced into the model (Figure 2a). The target variable also shows a similar increasing trend in annual influence, particularly in the final few years. Note that the step plot (Figure 2a) implies that area has little influence (when it is added to the model, the cpue index changes very little). In contrast, the plot of annual influence (Figure 2b) suggests that, although small compared with other variables, the influence of area is not inconsequential and has been increasing. The difference arises through partial confounding between area and vessel, which the step plot fails to capture, but which is represented in the final model coefficients and hence the annual influence measures.
As described above, the pattern of annual influence for an explanatory variable arises from a combination of its GLM coefficients and its distributional changes over years. It is common to provide separate tables or graphs of this information (e.g. McKenzie, 2008), but to understand the patterns of annual influence, it is useful to combine the coefficient values, the distributional changes, and the annual influence into a single plot: the coefficient–distribution–influence (CDI) plot.
Figure 3 is the CDI plot for vessel, the variable that shows the greatest variation in annual influence. The top panel of the plot provides normalized coefficients and their standard errors. In the bottom left panel, bubbles indicate the annual distribution of records across each level of the variable (in this case V); this is the proportion of records from each vessel in each year. The bottom right panel shows the annual values of influence for vessel calculated from Equations (2) and (4) or, equivalently, matrix multiplication of the normalized coefficients in the top panel by the proportions in the bottom left panel. Note that there is a shift in the proportion of records (bottom left panel), with vessels with low coefficients dropping out of the fishery, and vessels with a high coefficient remaining (top left panel), causing the increasing trend in influence (right panel).
A second example is the CDI plot for month (Figure 4), a variable that has high explanatory power (Table 1), but which shows little influence on annual cpue indices (Figure 2b). The coefficients for each month (top left, Figure 4) show as much variation as the coefficients for each vessel (top left, Figure 3). However, in contrast to vessel, there are only small changes in the distribution of records among months (bottom left panel), resulting in only small and variable changes in annual influence (right panel). The high influence in 1999 arises because there was a greater than usual proportion of effort in the months with the highest coefficients (November-March).
For some categorical explanatory variables, there is a natural order, as there is for month. For others, such as vessel, there is no natural order, and the CDI plot is more easily understood if the variables are ordered by their coefficient values. In Figure 3, the vessels were ranked from lowest to highest coefficient. For continuous variables, such as the number of tows or temperature, distributions can be plotted by binning the variables in some appropriate manner.
For the TRE 7 example, the overall influence statistics (Table 1) suggest that vessel was by far the most influential variable, at 20%, and that the other variables had influences of 7–11%. The substantial trend in the influence of vessel (Figure 2b) explains much of the difference in slope between standardized and unstandardized indices (Figure 1). The trend is also seen clearly in the lower right part of Figure 3.
The influence of a variable in a standardization model is independent of its explanatory power. In the TRE 7 example, the variable with the most explanatory power was not the most influential, as measured by the overall influence statistic (Table 1). Variables with high explanatory power are typified by large coefficients and large variation among records; those with large influence are typified not only by large coefficients, but also by great variation among years. In TRE 7, the month variable, with high explanatory power but low influence, had large coefficients, but the variable tended to be nearly uniformly distributed among years (Figure 4). The vessel variable, with less explanatory power, but more influence, was heterogenous among years (Figure 3).
Hence, the difference between standardized and unstandardized cpue in TRE 7 (Figure 1) can be explained mainly by the model's accounting for changes in the fleet towards more efficient vessels, seen in a greater proportion of records coming from vessels with high coefficients in more recent years (Figure 3). There was also some effect of shifts in fishing practice, with more targeting of trevally, seen in the influence plot (Figure 2b) and overall influence metric for the target-species variable (Table 1).
Discussion
The tools described here aid understanding why cpue standardization has produced the result it does. They allow the analyst to tease apart the effects of a GLM to gain more insight, and hence generate greater confidence in the standardization process. Without these types of diagnostic tool, it is easy for a GLM to be a black box analysis from which cpue indices simply emerge with little understanding of the reasons for the result. We have used these tools for several cpue standardizations and found them to be an invaluable tool in explaining the results to technical working groups, fisheries managers, and stakeholders.
The tools do not address any of the potential problems with using GLMs to standardize cpue or the influence of unknown variables on cpue (Bishop, 2006). They do not address any basic problems with cpue as an abundance index, such as hyperstability or hyperdepletion (Hilborn and Walters, 1992), but they do provide a means for better understanding the GLM results and can perhaps assist in exploring such potential problems.
A package for the R statistical language which can generate step plots, influence plots, CDI plots, and influence metrics is available at http://projects.trophia.com/influ.
Acknowledgements
This approach was developed during projects funded by, and involving, Area 2 Fisheries Management Company, Cawthron Institute, the New Zealand Ministry of Fisheries, and the New Zealand Seafood Industry Council. Thanks are due to members of the New Zealand Stock Assessment Methods Working Group, Shelton Harley, and an anonymous reviewer for useful suggestions on an earlier draft. Kevin Stokes suggested the term “influence” to us.
References
Author notes
Handling editor: Emory Anderson