Operationalizing ensemble models for scientiﬁc advice to ﬁsheries management

There are uncertainties associated with every phase of the stock assessment process, ranging from the collection of data, assessment model choice, model assumptions and interpretation of risk to the implementation of management advice. The dynamics of ﬁsh populations are complex, and our incomplete understanding of those dynamics (and limited observations of important mechanisms) necessitate that models are simpler than nature. The aim is for the model to capture enough of the dynamics to accurately estimate trends and abundance and to provide advice to managers about sustainable harvests. The status quo approach to assessment modelling has been to identify the ‘best’ model, based on diagnostics and model selection criteria, and to generate advice from that model, mostly ignoring advice from other model conﬁgurations regardless of how closely they performed relative to the chosen model. We review the suitability of the ensemble modelling paradigm to more fully capture uncertainty in stock assessment model building and the provision of advice. We recommend further research to evaluate potential gains in modelling performance and advice from the use of ensemble modelling, while also suggesting revisions to the formal process for reviewing models and providing advice to management bodies.


62
Another type of ensemble models, 'super-ensembles', have recently received attention in 63 fisheries. Super-ensembles refer to a technique where the ensemble is built by modelling 64 the predictions of the ensemble's components, which may include co-variates that were ensemble models to provide management advice. The current process of scientific advice is still strongly grounded in selecting a single stock assessment framework and a single 78 configuration from a set of competing candidate models and configurations.

79
The following sections explore methodological issues (section 2) and discuss the utilization working hypotheses about alternative states of nature (Chamberlin, 1965). We refer to this 103 theorectical set of models as the model space, a complete and continuous representation of  Acknowledging that fisheries systems are too complex to be described by a single model 106 (Tebaldi & Knutti, 2007;Chatfield, 1995;Draper, 1995;Stewart & Martell, 2015;Millar 107 et al., 2015), ensemble members may be chosen by their capacity to model different parts 108 of the system and capture structural uncertainty. The ensemble members should be complementary and ensemble methods should integrate across distinct representations of the 110 system, hopefully covering the most important processes, to estimate QoIs.

111
In contrast to structural uncertainty, ensemble members may be chosen to deal with para-

122
Understanding that structural uncertainty has a major impact in ensemble modelling, as 123 it forces the analyst to rethink one's modelling approach, is key to the approach. Instead 124 of choosing the 'best model' at the end of the model selection process, ensemble modelling 125 requires defining a full range of models at the very beginning. Figure 1 depicts simplified 126 workflows of model selection and ensemble modelling. The differences between the two 127 processes do not seem too extreme, although ensemble modelling will require much more 128 emphasis on choosing models, metrics, methods and QoIs than a conventional selection 129 process, where models are discarded until the best one emerges. There are several methods that can be used to combine model outcomes and estimate QoIs.

144
The most common way to compute ensemble estimates is to use some version of model  few models. A further restriction to using information theory metrics is that the data must 172 be the same (Burnham & Anderson, 2002). In assessment models, this restriction would 173 also extend to the data weighting that is sometimes specified, i.e., scores between models 174 would not be comparable if different data weights are assumed in each model.

175
Tactical weights are based on the models' capability of forecasting or predicting QoIs.

176
Historical performance of each model, hindcasts, cross-validation, experts' opinions or a 177 mix of several of the aforementioned methods can be used to compute these metrics. The 178 idea is to capture a feature of the model that is relevant for the analysis' objective. For 179 example, if the ensemble is used to forecast, then using each members' forecast capability, 180 also called model 'skill', seems intuitive. An advantage of this approach is that one could relax the restrictions for information theory metrics and potentially extend tactical metrics 182 to encompass several modelling approaches. year. An analytical solution may be difficult to derive and using resampling methods 233 may be the best option, in which case it is important to take auto-correlation into 234 account.

235
• Matrix or array: The outcome is a matrix, e.g., population numbers by year and age.

236
An analytical solution may be difficult to derive and using resampling methods may 237 be the best option, in which case it is important to take into account within-model 238 correlations across years and ages.

239
• Full stock and fisheries dynamics: The ensemble is used to build operating models 240 that require several matrices. In such cases metrics which need to have some degree of 241 coherence across them have to be combined, e.g., abundance in numbers by year and 242 age and fishing mortality by year and age. Analytical solutions are not available and 243 using resampling methods seems to be the only alternative, in which case correlation 244 structures need to be accounted for, both internal to the variable as well as across 245 variables.

246
To clarify the relationship between QoIs and applications, Table 1 shows the linkage be-247 tween the two. With the increasing complexity of the applications -stock status, forecast 248 and operating models -the complexity of the data product also increases. To estimate 249 the status of a stock a single or bivariate variable may be sufficient. When it comes to 250 forecasts, a full understanding of the stock exploitation history and productivity will be 251 necessary, and QoIs will be time series of projections under certain conditions. In data-rich 252 situations forecasts will also use matrices, like population abundance and selectivity by age 253 or length. Obviously, information about the status of the stock(s), mentioned above, will 254 be needed to set proper conditions for the analysis of future fishing opportunities. With 255 regards to building operating models, all of the previous will be needed plus several age or 256 length structures of the population, fleet selectivity, population productivity and, although 257 less commonly used, socio-economic information. In this case, several correlated matrices 258 will need to be included in the ensemble results. should not be treated as a dumpster for group indecision, nor should non-credible model 287 structures be included with the hope that the analysis will reject or severely penalize them.

288
Moving from the current single, best model approach to an ensemble approach is not as

304
The current spectrum of stock assessment methods is very diverse. Analytical methods, 305 which require age-or length-based data, range from virtual population analysis to state-306 space models including statistical catch-at-age methods. Data-limited methods include

316
Further development of general, modular, extensible, well-tested and well-documented soft-317 ware systems is required. The lack of consistency in the output from the plethora of avail-318 able stock assessment frameworks is probably one of the main factors limiting an immediate 319 trial of ensemble models. Although difficulties are inevitable when dealing with real cases, 320 having a common framework should allow solutions to be discussed and shared within a 321 large group of people dealing with similar problems. We therefore emphasize the impor-322 tance of standardizing formats of assessment outputs to facilitate collaboration and model 323 comparisons and make the process of ensemble modelling more efficient.

324
Processes to build ensemble models, develop performance metrics, algorithms, etc. re-325 quire additional work before becoming fully functional for scientific advice. In our opinion, 326 future studies should explicitly test the process of building the ensemble, comparing the 327 feasibility of combining outcomes from models of varying complexity and exploring the In the case of model selection (a), candidate models are analysed to find the 'best' (weight set to one) which is then used for advice, while all the other models are discarded (weights set to zero). For ensemble modelling (b), all candidate models are kept and combined (curly bracket) using probabilities or weights (W i ). The greenish square represents an Expert Working Group, which lays the ground for advice. The blue arrow represents the advisory process, which tends to differ across constituency.