Abstract

Research surveys are important to evaluate fishery resources’ spatial distribution and abundance. Although the underlying sampling is usually conceived with a focus on specific species, efficient designs can also collect data on secondary species. We present a framework to explore and evaluate the adequacy of alternative sampling designs for fishery research surveys aiming to maximize accuracy estimates of the secondary species abundance while maintaining the quality of the abundance estimates of primary species. A geostatistical model-based approach was developed considering the semi-continuous nature of the data and the excess of zero values commonly observed for secondary species. New sampling designs were defined according to optimization weights and evaluated based on the resulting prediction exactness. The framework was applied to the bottom trawl survey conducted along the Portuguese continental coast with European hake, Merluccius merluccius, as the primary species and thornback ray, Raja clavata, as the secondary species. The sampling design URSI provided the best balance between the accuracy for both primary and secondary species. The methodology can be replicated for other bottom trawl research surveys and an extended set of species. We recommend that a decision on which sampling design to adopt in future surveys should consider a cost-efficiency analysis.

Introduction

The fishery-independent quantitative species data collected during research surveys are important for monitoring the abundance of fishery resources, are commonly used as input for stock assessment models, and provide a basis for scientific advice on stock conservation status and fishing opportunities. Survey georeferenced data is also used to investigate the spatial and temporal distributions and abundance of species.

Although during the 1960s and 1970s stock trend analysis from stock assessments performed with virtual population analysis (VPA) were matched to fishery-dependent data (commercial catch per unit effort, CPUE), issues with the underlying assumption of a proportional relationship between CPUE and abundance emphasized the importance of using fishery-independent information. The research survey programmes that began in the 1960s were originally conceived as sources of biological information. Examples of these are the Woods Hole bottom trawl resource survey, which started in 1963 (Smith, 2002) and the International Bottom Trawl Survey in the North Sea, which started in 1965, initially aimed at juvenile herring in the central and southern North Sea (ICES, 2020). The Portuguese International Bottom Trawl Survey (PT-IBTS-Q4), conducted in Portuguese continental waters during Autumn, started in 1979 and was initially designed to monitor the distribution and abundance of the most important commercial species in the Portuguese trawl fishery (Cardador et al., 1997), although with a focus on estimating abundance indices of recruits of European hake Merluccius merluccius and horse mackerel Trachurus trachurus (Borges, 1984; Cardador et al., 1989). The design of the PT-IBTS-Q4 survey has changed to that of a multi-species survey, collecting data to estimate abundance and biomass indices in addition to biological parameters for other commercially important species, including fish (e.g. blue whiting Micromesistius poutassou, mackerel Scomber scombrus, and chub mackerel S. colias), crustaceans (e.g. Nephrops norvegicus and Parapente’s longirostris), and cephalopods (Chaves, 2018).

Fisheries research surveys should be efficient since marine survey programmes are expensive and time-consuming, but few studies have been published investigating the efficiency of multi-species sampling designs. Recently, Zhang et al. (2020) evaluated multi-species fisheries surveys considering several sampling and estimation methods and a wide range of sample sizes. The basis of their study was a simulation framework considering the joint distribution of multiple species. The design of multi-species or multi-purpose surveys requires attention to accuracy for both primary and secondary species, therefore demanding more complex and on-demand sampling schemes. Classical sampling theory and geostatistics address similar questions and result in unbiased estimates, but they are different. The former is related to design-based inference and the latter relies on model-based inference (Hoef, 2002). Geostatistical methods can be more efficient since they perform estimates closer to the true values (Hoef, 2002). In particular, a Bayesian model-based approach, allowing incorporating prior knowledge by defining prior distributions, may easily account for the particularities of the species under study and their habitat, as well as dealing with the different sources of variability present in complex data such as species distribution data. This exibility entails some additional computational cost.

In this study, we developed a framework and proposed methodologies to investigate whether the Portuguese bottom trawl survey can be more efficient by adopting an alternative spatial sampling design. The aim was to maximize the accuracy of abundance estimates for secondary species of the survey while maintaining the precision of the survey primary species abundance estimates. Different criteria were defined to achieve this main goal, prioritizing capturing the core distribution of a species, and minimizing the uncertainty regarding the species distribution. The species considered were the European hake, which is a primary species of the surveys, and the thornback ray Raja clavata, a secondary species, for which PT-IBTS-Q4 data has been used to estimate biomass indices. R. clavata is currently a priority species for stock assessment with production models (Pedersen and Berg, 2016) being managed at the European level by TAC (Total Allowable Catch).

This study evaluated survey efficiencies in estimating species biomass or abundance by comparing estimates derived from a random survey (under this design, fishing hauls are randomly selected) to others obtained with alternative spatial sampling designs. To achieve this, a common methodology was applied to the two selected species: one primary and one secondary. The procedure proposed was structured under a hierarchical Bayesian framework, which included the adjustment of a model-based approach that accounted for the semi-continuous nature of the data and the excess of zero values for thornback ray. We also considered the environmental variables that might impact the distribution and occurrence of the species under study. Finally, we compared the species model-based abundance estimates obtained under eight alternative non-random survey designs with those obtained from a random design. The eight non-random designs resulted from eight different ways of selecting the fishing hauls, depending on the specific priorities of each survey design. Comparison of the survey designs considered a balance between maximizing the accuracy of estimates and minimizing the uncertainty, and a balance between the objectives defined for both primary and secondary species.

Material and Methods

Description of survey design

The PT-IBTS-Q4 survey is carried out in Autumn by the Portuguese Institute for the Sea and Atmosphere (IPMA) using a bottom trawl (type Norwegian Campbell Trawl 1800/96 NCT) with a 20 mm codend mesh size and ground rope with bobbins. The survey adopted a stratified random sampling design during the period 1979–1989. Initially (1979–1980), the surveyed area along the Portuguese continental coast (from latitude 41º20′N to 36º30′N and from 20 to 750 m bottom depth) was divided in 15 strata (Cardador et al., 1997). The boundaries of each stratum were based on five geographic areas and three bathymetric levels, each stratum being divided into units of around 25 square nautical miles. In 1981, with the aim of decreasing the total variance of mean abundance indices by species, the fishing hauls (sampling units) were spread at random over 36 strata (combination of twelve geographic areas and three bottom depth intervals: [20–100 m[; [100–200 m[; [200–500 m[). Following analysis of the trade-off between biased estimates of species abundance with low variance and unbiased estimates with large variance, the survey design was changed to a predefined fishing hauls scheme in 1989 (ICES, 2002; ICES, 2017). The reduction of the abundance estimate variance obtained with the predefined sampling scheme was the dominant objective for the assessment of the southern stock of hake, at the time carried out with VPA tuning (ICES, 1990).

In 2005, the survey design changed to a sampling with partial replacement scheme (Cochran, 1977), recognising that the precision of the estimates for fish abundance trends over time could be improved by combining predefined and random fishing hauls, with a subset of hauls being matched from one survey to the next (ICES, 2005; Jardim and Ribeiro, 2007; ICES, 2020). A maximum of 96 fishing hauls, along the 36 survey strata (western and southern coasts of Portugal from 20–500 m), were sampled. These include 66 predefined hauls, distributed according to a regular grid of 5 × 5 nautical miles and considering that at least two fishing hauls should be made by stratum, with 30 hauls selected at random, carried out if ship time is available (ICES, 2002). Fishing hauls are carried out during daylight at a towing mean speed of 3.5 knots with a haul duration of 30 min (Chaves, 2018).

Species studied and environmental data

European hake is one of the most important species in western demersal surveys, given its high abundance, wide distribution, and importance in the trophic chain (Casey and Pereiro, 1995). This species is distributed from the coast of Mauritania to the western coasts of Norway and Iceland; it is also found in the North Sea, Skagerrak, Kattegat, and Mediterranean waters (Stehmann and Bürkel, 1984). Casey and Pereiro (1995) stated that European hake inhabits depths ranging from 30 m to >500 m, over mud/sand and rocky substrata.

The thornback ray is caught by Portuguese fisheries along the coast and is one of the most common elasmobranch and skate species found in European waters (Walker and Hislop, 1998; Machado et al., 2004). It is distributed along the eastern Atlantic from Norway and Iceland to South Africa, including the Mediterranean and Black Seas (Stehmann and Bürkel, 1984). The species is mainly found on hard seabed (e.g. gravel and pebble), in areas of intermediate to strong tidal currents (Ellis et al., 2005), from near shore to 300 m deep, with extreme records around 1000 m (Ebert and Dando, 2021). Since 1956, studies observed a decline in the occurrence of thornback ray in the North Sea (Walker and Heessen, 1996; Dulvy et al., 2000), but in recent years its stock has recovered in that area and throughout European waters (ICES, 2021).

Regarding the species distribution data, the exploitable biomass index of thornback ray, referring to the population encompassing fish with a total length over 50 cm, was calculated by haul and the unit used was kg/hour. The index of hake abundance was the number of fish caught per hour (nb/hour), which was also determined by fishing haul. The centroid position of each fishing haul (longitude, latitude) was calculated using the geographic coordinates of the start and final position of each fishing haul.

The studied area covered the western coast of Portugal (36º30′N to 41º20′N) (Figure S1). We adopted this restriction because the geomorphology of the Portuguese coast determines different directions of the southern coast and its bathymetry, which is associated with distinct oceanographic features and environmental conditions (Relvas et al., 2007) as well as community assemblages (Moura et al., 2020). Failure to consider this restriction may require an alternative model to detect the different effects of environmental conditions in the south coast, for instance. Therefore, the west coast data were used as an example of application since its dimension is larger. Hence, nine geographic areas were considered in our study, from CAM in the north to ARR in the south (Figure S1).

The type of substratum and bathymetry were the environmental variables considered as both are known to be related with the thornback ray habitat (Santos et al., 2021). Bathymetric and type of substratum sediment data were collected from the EMODnet central portal, accessible from http://www.emodnet.eu/. Bathymetric data was represented in meters. The type of substratum was classified into five categories (sand; rock and boulders; mud to muddy sand; mixed sediment and coarse-grained sediment) (Figure S2). We excluded non-trawlable zones, characterized by the rock and boulders substratum, from our prediction area (black zones represented in Figure S2).

The proposed methodology

Our methodology focused on proposing alternative survey designs that enhance the most accurate estimation of species abundance possible. Moreover, the proposed methodology is flexible: it allows application to other species groups or even more than two species, the introduction of different objectives, like minimizing the cost of performing a survey, and the incorporation of other survey constraints. For each survey design, 65 fishing hauls were considered, of which 54 were predefined by the Portuguese bottom trawl survey and 11 were selected in order to improve the fishery-independent information according to a particular objective. Indeed, the more data available on species, the more knowledge about them can be gained. This decision carries a financial and logistic cost, so it depends on the institution responsible for the surveys. The specific constraints governing the methodology for the sampling design were: do not consider non-trawlable zones, restrict the study region down to 200 m of depth, which corresponds to the main bathymetric limit distribution of R. clavata, and sample at least two fishing hauls in each stratum. Moreover, in order to prevent selecting sampling locations that are too close to each other when choosing new fishing hauls, we took into account a minimum distance between locations. For our study, we determined this distance based on the minimum distance observed between two locations in previous surveys. It was important to consider this requirement because selecting locations that are too close would result in redundant information about the species.

The four-steps framework to obtain design proposal for a survey and its application to the PT-IBTS-Q4 survey was as follows:

  1. Selection of the best species distribution model (SDM) under a Bayesian framework using the available observed survey data and estimation of the predicted surface on a fine grid for each species. The selection of the best SDM was based on goodness of fit and predictive quality criteria, using deviance information criterion (Spiegelhalter et al., 2002) and log-conditional predictive ordinate values (Roos and Held, 2011). For each species, various models, conjugating different combinations of functions of covariates, were compared. This step allowed to map abundance indices and its uncertainty, both indispensable for the next step. More details in SDM in the “Species distribution model” section .

  2. Selection of 11 new fishing hauls, according to sampling weights (presented Table 1), to be added to the predefined fishing hauls and establish alternative sampling survey designs. The sampling weights were computed using the outputs from the previous step, maps of abundance indices and uncertainty. The step is detailed in the “Alternative survey sampling designs” section.

  3. Prediction of abundance indices for each species at the observed locations of the benchmark survey (a chosen observed past survey) assuming the predicted estimates for the alternative sampling designs, established in the step 2, as the ground truth. This prediction procedure considered the best SDM and the predicted estimates from step 1.

  4. Evaluation of the alternative sampling designs by comparing the predicted values, obtained from step 3, with the observed values of the benchmark survey using the mean absolute and root mean square errors. Observed values of the benchmark survey corresponded to the observed catches for each species during the 2015 survey. More details in the “Assessment methodology of the alternative survey designs” section.

Table 1.

Weights used to select new fishing hauls and corresponding objectives.

Design mWeight |$w_{\boldsymbol{s}}^m$|Objective
UB|$v_{\boldsymbol{s}}^R \times v_{\boldsymbol{s}}^M$|Minimize the uncertainty of what was not explained by the models applied to R. clavata and M. merluccius.
UBAB|$v_{\boldsymbol{s}}^R( {1 - u_{\boldsymbol{s}}^R} ) \times v_{\boldsymbol{s}}^M( {1 - u_{\boldsymbol{s}}^M} )$|Same objective of measure 1 maximizing, at same time, the biomass or abundance of both species.
UBAR|$v_{\boldsymbol{s}}^R( {1 - u_{\boldsymbol{s}}^R} ) \times v_{\boldsymbol{s}}^M$|Same objective of measure 1 maximizing the biomass of R. clavata.
UBSI|$v_{\boldsymbol{s}}^R \times v_{\boldsymbol{s}}^M \times {q}_{\boldsymbol{s}}$|Same objective of measure 1 giving importance to geographic strata of the study are throughout |${q}_{\boldsymbol{s}}$|⁠.
UR|$\hat{\sigma }_{\boldsymbol{s}}^R$|Minimize the uncertainty of what was not explained by the model for R. clavata.
URAR|$\hat{\sigma }_{\boldsymbol{s}}^R( {1 - u_{\boldsymbol{s}}^R} )$|Same objective of measure 5 maximising, at same time, the biomass of R. clavata.
URSI|$\hat{\sigma }_{\boldsymbol{s}}^R \times {q}_{\boldsymbol{s}}$|Same objective of measure 5 giving importance to geographic strata of the study are throughout |${q}_{\boldsymbol{s}}$|⁠.
SimD|$\sqrt {{{( {\sqrt {u_{\boldsymbol{s}}^R} - \sqrt {u_{\boldsymbol{s}}^M} } )}}^2} $|Maximize the similarity between the distributions of the two species.
Rand|$\frac{1}{N}$|Random selection giving the same importance to all locations.
Design mWeight |$w_{\boldsymbol{s}}^m$|Objective
UB|$v_{\boldsymbol{s}}^R \times v_{\boldsymbol{s}}^M$|Minimize the uncertainty of what was not explained by the models applied to R. clavata and M. merluccius.
UBAB|$v_{\boldsymbol{s}}^R( {1 - u_{\boldsymbol{s}}^R} ) \times v_{\boldsymbol{s}}^M( {1 - u_{\boldsymbol{s}}^M} )$|Same objective of measure 1 maximizing, at same time, the biomass or abundance of both species.
UBAR|$v_{\boldsymbol{s}}^R( {1 - u_{\boldsymbol{s}}^R} ) \times v_{\boldsymbol{s}}^M$|Same objective of measure 1 maximizing the biomass of R. clavata.
UBSI|$v_{\boldsymbol{s}}^R \times v_{\boldsymbol{s}}^M \times {q}_{\boldsymbol{s}}$|Same objective of measure 1 giving importance to geographic strata of the study are throughout |${q}_{\boldsymbol{s}}$|⁠.
UR|$\hat{\sigma }_{\boldsymbol{s}}^R$|Minimize the uncertainty of what was not explained by the model for R. clavata.
URAR|$\hat{\sigma }_{\boldsymbol{s}}^R( {1 - u_{\boldsymbol{s}}^R} )$|Same objective of measure 5 maximising, at same time, the biomass of R. clavata.
URSI|$\hat{\sigma }_{\boldsymbol{s}}^R \times {q}_{\boldsymbol{s}}$|Same objective of measure 5 giving importance to geographic strata of the study are throughout |${q}_{\boldsymbol{s}}$|⁠.
SimD|$\sqrt {{{( {\sqrt {u_{\boldsymbol{s}}^R} - \sqrt {u_{\boldsymbol{s}}^M} } )}}^2} $|Maximize the similarity between the distributions of the two species.
Rand|$\frac{1}{N}$|Random selection giving the same importance to all locations.
Table 1.

Weights used to select new fishing hauls and corresponding objectives.

Design mWeight |$w_{\boldsymbol{s}}^m$|Objective
UB|$v_{\boldsymbol{s}}^R \times v_{\boldsymbol{s}}^M$|Minimize the uncertainty of what was not explained by the models applied to R. clavata and M. merluccius.
UBAB|$v_{\boldsymbol{s}}^R( {1 - u_{\boldsymbol{s}}^R} ) \times v_{\boldsymbol{s}}^M( {1 - u_{\boldsymbol{s}}^M} )$|Same objective of measure 1 maximizing, at same time, the biomass or abundance of both species.
UBAR|$v_{\boldsymbol{s}}^R( {1 - u_{\boldsymbol{s}}^R} ) \times v_{\boldsymbol{s}}^M$|Same objective of measure 1 maximizing the biomass of R. clavata.
UBSI|$v_{\boldsymbol{s}}^R \times v_{\boldsymbol{s}}^M \times {q}_{\boldsymbol{s}}$|Same objective of measure 1 giving importance to geographic strata of the study are throughout |${q}_{\boldsymbol{s}}$|⁠.
UR|$\hat{\sigma }_{\boldsymbol{s}}^R$|Minimize the uncertainty of what was not explained by the model for R. clavata.
URAR|$\hat{\sigma }_{\boldsymbol{s}}^R( {1 - u_{\boldsymbol{s}}^R} )$|Same objective of measure 5 maximising, at same time, the biomass of R. clavata.
URSI|$\hat{\sigma }_{\boldsymbol{s}}^R \times {q}_{\boldsymbol{s}}$|Same objective of measure 5 giving importance to geographic strata of the study are throughout |${q}_{\boldsymbol{s}}$|⁠.
SimD|$\sqrt {{{( {\sqrt {u_{\boldsymbol{s}}^R} - \sqrt {u_{\boldsymbol{s}}^M} } )}}^2} $|Maximize the similarity between the distributions of the two species.
Rand|$\frac{1}{N}$|Random selection giving the same importance to all locations.
Design mWeight |$w_{\boldsymbol{s}}^m$|Objective
UB|$v_{\boldsymbol{s}}^R \times v_{\boldsymbol{s}}^M$|Minimize the uncertainty of what was not explained by the models applied to R. clavata and M. merluccius.
UBAB|$v_{\boldsymbol{s}}^R( {1 - u_{\boldsymbol{s}}^R} ) \times v_{\boldsymbol{s}}^M( {1 - u_{\boldsymbol{s}}^M} )$|Same objective of measure 1 maximizing, at same time, the biomass or abundance of both species.
UBAR|$v_{\boldsymbol{s}}^R( {1 - u_{\boldsymbol{s}}^R} ) \times v_{\boldsymbol{s}}^M$|Same objective of measure 1 maximizing the biomass of R. clavata.
UBSI|$v_{\boldsymbol{s}}^R \times v_{\boldsymbol{s}}^M \times {q}_{\boldsymbol{s}}$|Same objective of measure 1 giving importance to geographic strata of the study are throughout |${q}_{\boldsymbol{s}}$|⁠.
UR|$\hat{\sigma }_{\boldsymbol{s}}^R$|Minimize the uncertainty of what was not explained by the model for R. clavata.
URAR|$\hat{\sigma }_{\boldsymbol{s}}^R( {1 - u_{\boldsymbol{s}}^R} )$|Same objective of measure 5 maximising, at same time, the biomass of R. clavata.
URSI|$\hat{\sigma }_{\boldsymbol{s}}^R \times {q}_{\boldsymbol{s}}$|Same objective of measure 5 giving importance to geographic strata of the study are throughout |${q}_{\boldsymbol{s}}$|⁠.
SimD|$\sqrt {{{( {\sqrt {u_{\boldsymbol{s}}^R} - \sqrt {u_{\boldsymbol{s}}^M} } )}}^2} $|Maximize the similarity between the distributions of the two species.
Rand|$\frac{1}{N}$|Random selection giving the same importance to all locations.

Species distribution model

Although the SDM allowed studying the relationship between the species distribution and the environment, the main objective of using a SDM was to predict species abundance for unsampled locations that constituted a fine grid and possible locations of the alternative designs.

The SDM for each species was constructed using observations of the Autumn PT-IBTS-Q4 survey (2013–2016) by assuming that, for the environmental variables under study, no major differences occurred across the years. Moreover, we considered that the observations of the surveys complemented each other, being treated as a single collection in time. To test the assumption that both species have a similar spatial distribution over surveys, we applied the Knox and Mantel statistical tests for space-time interaction (Meyer et al., 2016) to our datasets.

According to the SDM adopted, the spatial process for the biomass index of R. clavata or for the abundance index of M. merluccius at location |${\boldsymbol{s}}$| was defined by |$Y( {\boldsymbol{s}} )$| and the presence/absence sub-process, represented by |$Z( {\boldsymbol{s}} )$|⁠, took the value 0 if the species was not observed at location |${\boldsymbol{s}}$| and 1 otherwise. Consequently, given that |$Z\ ( {\boldsymbol{s}} ) = \ 1$|⁠, |$Y( {\boldsymbol{s}} )$| took the positive value of biomass or abundance index observed at location |${\boldsymbol{s}}$|⁠. Under this SDM it was assumed that |$Z( {\boldsymbol{s}} )$| came from a Bernoulli distribution with a probability of success, |$p( {\boldsymbol{s}} )$|⁠, and that |$Y( {\boldsymbol{s}} )$|⁠, given that |$Z\ ( {\boldsymbol{s}} ) = \ 1$|⁠, followed a Gamma distribution with the shape parameter, |$a( {\boldsymbol{s}} )$|⁠, and scale parameter, |$b( {\boldsymbol{s}} )$|⁠. The adequacy of a Gamma distribution to model the strictly positive values was evaluated by the Kolmogorov-Smirnov test (Frank and Massey, 1951). This two-part SDM was fitted under the Bayesian framework and was defined as:

(1)

and

(2)

The probability of occurrence, |$p( {\boldsymbol{s}} )$|⁠, was modelled through the logit link function |$log( {\frac{{p( {\boldsymbol{s}} )}}{{1 - p( {\boldsymbol{s}} )}}} )$| and the expected abundance or biomass index, |$\mu \ ( {\boldsymbol{s}} ) = \ a( {\boldsymbol{s}} )/b( {\boldsymbol{s}} )$|⁠, through its logarithm. The |$f( . )$| denoted possible transformation functions (linear effects, including for categorical covariates, linear splines (Zuur et al., 2017), logarithm, square root, and inverse) or smoother functions of environmental covariates, |${X}_{1,j}( {\boldsymbol{s}} )$|⁠. The terms |${\alpha }_1$| and |${\alpha }_2$| represented the intercepts. The prior distributions for parameters |${\alpha }_i$| and k were defined as Gaussian with mean zero and variance 1000, so that they are less informative since no prior knowledge about these parameters was considered.

|$W( {\boldsymbol{s}} )$| represented a spatial random effect modelled as a Gaussian Markov random field (GMRF) and it was further assumed to be an intrinsically stationarity process of mean zero. |$W( {\boldsymbol{s}} )$| resulted from an approximation of a latent Gaussian field (GF), using a method based on stochastic partial differential equations (SPDE), as proposed by Lindgren et al. (2011). The SPDE approach allows approximation of a spatial continuous field, represented by a Matérn covariance function, by a Markov field. This approximation was adopted due to its computational advantages. Parameterization was carried out in terms of the marginal variance of the data, |${\sigma }^2$|⁠, and the radius of influence, |$\phi $|⁠, whose prior distributions were specified under penalized complexity prior framework (Lindgren et al., 2011). The two-part probability model could be represented as the product:

(3)

where [.] means “distribution of” and.|. means “conditional to”.

Alternative survey sampling designs

As described in the “The proposed methodology” section, the step 2 consists of selecting new fishing hauls (⁠|${n}_1 + {n}_2$|⁠) to establish the alternative survey designs. This step was performed in two phases. First, 7 (⁠|${n}_1$|⁠) fishing hauls were selected to guarantee the survey condition—at least two sampling units by stratum (combination of nine geographic areas and two bottom depth intervals: ([0–100 m[; [100–200 m[). At second phase, and since the proposal of new sampling design allowed to add 11 new fishing, 4 (⁠|${n}_2$|⁠) hauls were selected spread over the entire prediction region.

The criteria for constructing the alternative designs were defined to obtain accurate species distribution over the study region using observations of research surveys. In a more specific way, the criteria aimed to minimize the uncertainty resulting from the modelling process, that is, the uncertainty of what was not explained by the SDM. Furthermore, some of the criteria presented a complementary objective such as the maximization of the abundance index of both or either species, or the prioritization of specific regions according to a stratum ranking. Each criterion was described by a sample weight formula (Table 1). The design names were attributed to be acronyms of the respective objective. UB aimed to minimize the Uncertainty resulting from Both species, UBAB aimed to minimize the Uncertainty resulting from Both species and maximize the Abundance of Both species, UBAR aimed to minimize the Uncertainty resulting from Both species and maximize the Abundance of R. clavata, UBSI aimed to minimize the Uncertainty resulting from Both species given the Strata Importance, UR aimed to minimize the Uncertainty from R. clavata, URAR aimed to minimize the Uncertainty resulting from R. clavata and maximize the Abundance of R. clavata, URSI aimed to minimize the Uncertainty resulting from R. clavata given the Strata Importance, SimD aimed to maximize the Similarity of Distribution of species, and Rand represented the random design. For reasons of terminological simplicity, hereafter the set of all designs except the Rand design were referred to as non-random conditional to the SDM used for weight construction. The corresponding weights, computed for each location of the prediction surface, were based on two components:

(4)

and

(5)

where |$\hat{y}_{\boldsymbol{s}}^{\boldsymbol{i}}$| referred to the median prediction estimate of biomass or abundance index and |$\hat{\sigma }_{\boldsymbol{s}}^{\boldsymbol{i}}$| represented the standard deviation of spatial effects for location |${\boldsymbol{s\ }} = \{ {{{\boldsymbol{s}}}_1,{{\boldsymbol{s}}}_2, \ldots ,{{\boldsymbol{s}}}_N} \}{\boldsymbol{\ }}$| and species |$i\ = \{ {R,M} \}\ $| such that R identifies thornback ray and M identifies hake. |$\hat{\sigma }_{\boldsymbol{s}}^{\boldsymbol{i}}$| was interpreted as an uncertainty measure of what was not explained by the covariates, that is, it represents the remaining variance after considering the explanatory variables. Both |$\hat{y}_{\boldsymbol{s}}^{\boldsymbol{i}}$| and |$\hat{\sigma }_{\boldsymbol{s}}^{\boldsymbol{i}}$| were standardized by the corresponding maximum, since the ranges of abundance and standard deviations of spatial effects were different for the two species.

The weights defined for designs UBSI and URSI considered the geographic area, through the component |${q}_{\boldsymbol{s}}$|⁠, which took integer values from 1 to 9 ranging from higher to lower priority. Locations from the same area have the same value for |${q}_{\boldsymbol{s}}$|⁠. The ranking was based on observed stratified mean for each stratum, such that the stratum with component value of 1 corresponds to the stratum with the highest stratified mean during the study period and so on. Thus, assuming that the uncertainty is constant over the study region, the locations from the geographic area identified by |${q}_{\boldsymbol{s}} = \ 8$| are more likely to be sampled than the locations from the geographic area identified by |${q}_{\boldsymbol{s}} = \ 9$|⁠, the locations from the geographic area identified by 7 are more likely to be sampled than the locations from the geographic areas identified by 8 and 9 and so on. The prioritization of strata was included as another way of maximizing the species abundance index. Weight 8 (maximize the similarity between the distributions of the two species) was aligned with that suggested by (Pennino et al., 2016) to evaluate the overlapping of predictions from two datasets.

Therefore, the new additional fishing hauls of alternative design m were defined as locations that minimize the weights |$w_{\boldsymbol{s}}^m$|⁠: (i) for each incomplete stratum, leading to the proposal of seven new fishing hauls to accomplish the survey condition together with 54 predefined hauls (at least two fishing hauls by stratum); (ii) for the entire study area to select the remaining four locations without any restriction. Steps (i) and (ii) were performed for each design m presented in Table 1. This allowed us to move forward with the proposal of the alternative survey sampling designs, which needed to be evaluated among themselves and compared with a benchmark.

Assessment methodology of the alternative survey designs

The step 3 included the use of a survey that took place in the past as a benchmark. In our case study, we chose the 2015 survey because it presented a lower rate of zeros in the thornback ray biomass observed during the 2013–2016 period. The predicted abundance indices, obtained from step 1, for the locations that constituted the alternative designs, obtained from step 2, were used as input data for a new application of the best SDM (selected in step 1). Then, for each proposed design, we predicted the abundance indices for each species at the observed locations, assuming the design as the ground truth. We end up with eight databases of predicted abundance indices for each species. Each database holds the selected locations (longitude and latitude), all covariates (in our case, the corresponding bathymetry and the substratum type), and the predictive posterior medians for the secondary and primary species (in our case, thornback ray biomass and hake abundance indices).

In the step 4, the biomass or the abundance index predicted values for each species were compared with the values observed in the benchmark survey using classic assessment metrics, such as mean absolute error (MAE) and root mean square error (RMSE). The combination of MAE and RMSE metrics are often likewise used to assess model performance (Chai and Draxler, 2014). This step aimed to evaluate how the alternative design could reflect what was defined as ground truth.

Furthermore, with the aim of further evaluating the non-random eight survey designs, we suggested contrasting them with the design currently adopted by IPMA (Rand design presented in Table 1), which had a random selection of locations according to a homogeneous spatial Poisson process. Firstly, we simulated 200 sets of fishing hauls, allowing the random selection of new locations (in our case, 11) as long as they meet the predefined design constraints (in our case, having at least two sampling units by stratum). Secondly, we computed two assessment metrics, MAE and RMSE, by comparing the predicted values of biomass and abundance index obtained from each of the 200 simulated sets with those observed in the benchmark survey. Finally, we considered the approximated empirical distribution of the 200 values obtained for the assessment metrics as an approximation of their theoretical distribution function. That way, the design Rand was used for evaluating the performance of the eight proposed non-random sampling designs.

In a concise and generic way, the proposed process is described by a flowchart (Figure 1). Observed data are the survey data of thornback ray (from 2013 to 2016) and European hake (2015 and 2016), whose survey was described in the “Species studied and environmental data” section, while 2015 survey data was used as benchmark at point 3 of Figure 1. The reason for different study periods for primary and secondary species lies in the characteristics of the survey data. In particular, the thornback ray survey data are characterized by a large number of zeros, which makes it difficult to fit the model and establish a relationship with the environment that can lead to accurate predictions. So, to improve the quality of predictions, dataset was augmented to consider data from surveys conducted between 2013 and 2016. In contrast, the hake abundance survey data presents a more favourable distribution, with fewer zeros and a less heavy tail compared to the thornback ray data.

SDM was performed using the INLA approach in R software. The R code corresponding to the survey design procedure and its assessment is available on GitHub (https://github.com/SilvaPDaniela/Evaluation-of-survey-designs-for-species-distributionestimation).

Flowchart of the methodology applied in this study to evaluate alternative survey designs.
Figure 1.

Flowchart of the methodology applied in this study to evaluate alternative survey designs.

Results

Frequency of occurrence and observed biomass of thornback and abundance of hake

During the period 2013–2016, the thornback ray biomass index was computed in 212 fishing hauls along the western coast of Portugal, with this information available for at least 49 different fishing hauls each year. The number of fishing hauls with a strictly positive value for the catch of thornback ray varied annually between 6 (2014) and 15 (2015) (Table 2). Between 2015 and 2016, the hake abundance index was computed for 101 fishing hauls. The number of hauls with strictly positive values for the catch of hake was 50 and 44, in 2015 and 2016, respectively (Table 2). The empirical distribution of positive hake catches (nb/hour) was right skewed, with a range between 2 and 3933 individuals and a mean value of 411 fish per hour (Figure S3).

Table 2.

Number (nb) of recorded fishing hauls, number of fishing hauls with a strictly positive catch, and percentage of zero catches by species, during PT-IBTS-Q4 surveys conducted between 2013 and 2016.

SpeciesFishing hauls/catch2013201420152016Total
Thornback raynb of recorded fishing hauls57495452212
nb of fishing hauls with presence96151141
% of zero catch8488727981
Hakenb of recorded fishing hauls5348101
nb of fishing hauls with presence504494
% of zero catch687
SpeciesFishing hauls/catch2013201420152016Total
Thornback raynb of recorded fishing hauls57495452212
nb of fishing hauls with presence96151141
% of zero catch8488727981
Hakenb of recorded fishing hauls5348101
nb of fishing hauls with presence504494
% of zero catch687
Table 2.

Number (nb) of recorded fishing hauls, number of fishing hauls with a strictly positive catch, and percentage of zero catches by species, during PT-IBTS-Q4 surveys conducted between 2013 and 2016.

SpeciesFishing hauls/catch2013201420152016Total
Thornback raynb of recorded fishing hauls57495452212
nb of fishing hauls with presence96151141
% of zero catch8488727981
Hakenb of recorded fishing hauls5348101
nb of fishing hauls with presence504494
% of zero catch687
SpeciesFishing hauls/catch2013201420152016Total
Thornback raynb of recorded fishing hauls57495452212
nb of fishing hauls with presence96151141
% of zero catch8488727981
Hakenb of recorded fishing hauls5348101
nb of fishing hauls with presence504494
% of zero catch687

The highest exploitable biomass index of thornback ray (kg/hour) was observed in the area near Lisbon in 2015 and 2016 (Figure S1). The annual percentage of zero catches of thornback ray was high, varying between 72% and 88% (Table 2). The empirical distribution of positive of thornback ray catches (kg/hour) during 2013–2016 was also right skewed, with a biomass index range of 2.0–24.9 kg/hour and a mean value of 7.8 kg/hour (Figure S3). The abundance index of hake (nb/hour) was higher in 2015 than 2016, particularly in the northern part of the western Portuguese coast (Figure S1). The percentage of zero catches for this species was low, 6% in 2015 and 8% in 2016 (Table 2).

Species distribution model

For both species the results of the Knox and Mantel statistical tests gave high p-values, indicating that the assumption of both species have a similar spatial distribution over surveys was not violated. In addition, the results of the Kolmogorov-Smirnov test confirmed the adequacy of a Gamma distribution to model the strictly positive values, which indicated p-values of 0.808 and 0.603 for thornback ray biomass and hake abundance, respectively.

Thornback ray biomass

The model was fitted to these data, considering the SPDE approach and the substratum sediment type and bottom depth covariates as fixed effects. The probability of species occurrence decreases up to about 73 m of depth and then increases. From 175 m of depth, the depth has a positive effect on the occurrence (Figure 2). Regarding substratum sediment type, the sand sediment seems to reduce the occurrence probability of thornback ray in 50% when compared with the remaining substrate types (Table 3). The thornback ray occurrence results indicated some uncertainty associated with both environmental effects since the depth did not become relevant at any instant of its domain and the parameter associated to the substratum type was not statistically relevant, but we kept the corresponding covariate to achieve better predictive performance of the model. The biomass index of thornback ray decreases up to 137 m of depth, then increases up to 208 m and then decreases again (Figure 2). However, the areas that enhance the highest values of biomass presents a bathymetry between 182 and 268 m. Analysis of the posterior distributions of spatial covariance parameters for the thornback ray biomass model indicated that spatial autocorrelation was negligible beyond 55 km. The mode for the marginal standard deviation of the spatial effects was 1.2 (second and third panels in the upper row of Figure S4). The mean precision of the Gamma observations was estimated as 5.9 (Figure S4, first panel, upper row).

Depth effects on the occurrence (first column) and abundance/biomass index (second column) for the thornback ray (first row) and the European hake (second row) resulting from the modelling of biomass and abundance. Vertical lines represent the 80% quantiles for the observed depth.
Figure 2.

Depth effects on the occurrence (first column) and abundance/biomass index (second column) for the thornback ray (first row) and the European hake (second row) resulting from the modelling of biomass and abundance. Vertical lines represent the 80% quantiles for the observed depth.

Table 3.

Main statistics of the posterior distributions for fixed effects for the thornback ray biomass index.

ProcessCovariateMeanS.D.|${\chi }_{0.025}$||${\chi }_{0.5}$||${\chi }_{0.975}$|Mode
|$Z( {\boldsymbol{s}} )$|Intercept−1.0270.564−2.197−1.0120.051−0.992
Substratum (Sand sediment)−0.6800.479−1.635−0.6760.248−0.668
|$Y( {\boldsymbol{s}} )|( {Z\ ( {\boldsymbol{s}} ) = \ 1} )$|Intercept1.4650.3220.7541.4872.0421.525
ProcessCovariateMeanS.D.|${\chi }_{0.025}$||${\chi }_{0.5}$||${\chi }_{0.975}$|Mode
|$Z( {\boldsymbol{s}} )$|Intercept−1.0270.564−2.197−1.0120.051−0.992
Substratum (Sand sediment)−0.6800.479−1.635−0.6760.248−0.668
|$Y( {\boldsymbol{s}} )|( {Z\ ( {\boldsymbol{s}} ) = \ 1} )$|Intercept1.4650.3220.7541.4872.0421.525

S.D. is the standard deviation and : refers to interaction between covariates. |${\chi }_q$| identifies the quantile of probability q. |$Z( {\boldsymbol{s}} )$| identifies the process of the occurrence probability and |$Y( {\boldsymbol{s}} )|( {Z\ ( {\boldsymbol{s}} ) = \ 1} )$| represents the process of biomass under occurrence (positive response).

Table 3.

Main statistics of the posterior distributions for fixed effects for the thornback ray biomass index.

ProcessCovariateMeanS.D.|${\chi }_{0.025}$||${\chi }_{0.5}$||${\chi }_{0.975}$|Mode
|$Z( {\boldsymbol{s}} )$|Intercept−1.0270.564−2.197−1.0120.051−0.992
Substratum (Sand sediment)−0.6800.479−1.635−0.6760.248−0.668
|$Y( {\boldsymbol{s}} )|( {Z\ ( {\boldsymbol{s}} ) = \ 1} )$|Intercept1.4650.3220.7541.4872.0421.525
ProcessCovariateMeanS.D.|${\chi }_{0.025}$||${\chi }_{0.5}$||${\chi }_{0.975}$|Mode
|$Z( {\boldsymbol{s}} )$|Intercept−1.0270.564−2.197−1.0120.051−0.992
Substratum (Sand sediment)−0.6800.479−1.635−0.6760.248−0.668
|$Y( {\boldsymbol{s}} )|( {Z\ ( {\boldsymbol{s}} ) = \ 1} )$|Intercept1.4650.3220.7541.4872.0421.525

S.D. is the standard deviation and : refers to interaction between covariates. |${\chi }_q$| identifies the quantile of probability q. |$Z( {\boldsymbol{s}} )$| identifies the process of the occurrence probability and |$Y( {\boldsymbol{s}} )|( {Z\ ( {\boldsymbol{s}} ) = \ 1} )$| represents the process of biomass under occurrence (positive response).

The posterior median of the predictive distribution for the thornback ray biomass index indicated that thornback biomass is higher near to Lisbon and at shallower waters despite the observed high values of biomass at the north of Lisbon and far away from the coast (Figure 3, left panel). Unsurprisingly, the standard deviation estimates of spatial effects were lower near to sampled locations since the estimation in these locations is more accurate (Figure 3, right panel).

Map of the posterior median of the predictive distribution of biomass index (left panel) and posterior standard deviations (right panel) of spatial effects of thornback ray.
Figure 3.

Map of the posterior median of the predictive distribution of biomass index (left panel) and posterior standard deviations (right panel) of spatial effects of thornback ray.

Hake abundance

Generically, the probability of occurrence and the abundance index of hake increase with the depth. In areas with a bottom depth up to 90 m the abundance index of hake increases by 15 fish per hour for each increment of ∼2.7 m, then the effect of depth is almost constant (Figure 2). Results also indicated that hake is nearly twice as abundant in mud and muddy sand locations than in other type of sediment substratum (Table 4). Regarding the spatial covariance parameters for the hake abundance model, no spatial autocorrelation is expected for distances over 118 km and the mode for the marginal standard deviation of the spatial effects was 1.1 (second and third panels in the lower row of Figure S4). The precision mean for Gamma observations was estimated as 0.8 (first lower row panel Figure S4).

Table 4.

Main statistics of posterior distributions for fixed effects for the hake abundance index.

ProcessCovariateMeanS.D.|${\chi }_{0.025}$||${\chi }_{0.5}$||${\chi }_{0.975}$|Mode
|$Z( {\boldsymbol{s}} )$|Intercept5.7742.0641.7665.7539.9125.714
|$Y( {\boldsymbol{s}} )|( {Z\ ( {\boldsymbol{s}} ) = \ 1} )$|Intercept−9.7413.242−16.203−9.720−3.400−9.681
Substratum (Mud)0.8120.3080.2130.8091.4240.804
ProcessCovariateMeanS.D.|${\chi }_{0.025}$||${\chi }_{0.5}$||${\chi }_{0.975}$|Mode
|$Z( {\boldsymbol{s}} )$|Intercept5.7742.0641.7665.7539.9125.714
|$Y( {\boldsymbol{s}} )|( {Z\ ( {\boldsymbol{s}} ) = \ 1} )$|Intercept−9.7413.242−16.203−9.720−3.400−9.681
Substratum (Mud)0.8120.3080.2130.8091.4240.804

S.D. is the standard deviation and : refers to interaction between covariates. |${\chi }_q$| identifies the quantile of probability q. |$Z( {\boldsymbol{s}} )$| identifies the process of the occurrence probability and |$Y( {\boldsymbol{s}} )|( {Z\ ( {\boldsymbol{s}} ) = {\rm{\ }}1} )$| represents the process of abundance under occurrence (positive response).

Table 4.

Main statistics of posterior distributions for fixed effects for the hake abundance index.

ProcessCovariateMeanS.D.|${\chi }_{0.025}$||${\chi }_{0.5}$||${\chi }_{0.975}$|Mode
|$Z( {\boldsymbol{s}} )$|Intercept5.7742.0641.7665.7539.9125.714
|$Y( {\boldsymbol{s}} )|( {Z\ ( {\boldsymbol{s}} ) = \ 1} )$|Intercept−9.7413.242−16.203−9.720−3.400−9.681
Substratum (Mud)0.8120.3080.2130.8091.4240.804
ProcessCovariateMeanS.D.|${\chi }_{0.025}$||${\chi }_{0.5}$||${\chi }_{0.975}$|Mode
|$Z( {\boldsymbol{s}} )$|Intercept5.7742.0641.7665.7539.9125.714
|$Y( {\boldsymbol{s}} )|( {Z\ ( {\boldsymbol{s}} ) = \ 1} )$|Intercept−9.7413.242−16.203−9.720−3.400−9.681
Substratum (Mud)0.8120.3080.2130.8091.4240.804

S.D. is the standard deviation and : refers to interaction between covariates. |${\chi }_q$| identifies the quantile of probability q. |$Z( {\boldsymbol{s}} )$| identifies the process of the occurrence probability and |$Y( {\boldsymbol{s}} )|( {Z\ ( {\boldsymbol{s}} ) = {\rm{\ }}1} )$| represents the process of abundance under occurrence (positive response).

The posterior median of the predictive distribution for the hake abundance index indicated that hake abundance is higher in the northern area and at deeper waters (Figure 4, left panel). In accordance with what was also observed for the thornback ray biomass model, the uncertainty indicators for hake estimates were lower near the sampled locations (Figure 4, right panel).

Map of the posterior median of the predictive distribution of the abundance index (left panel) and posterior standard deviation of spatial effects (right panel) for hake.
Figure 4.

Map of the posterior median of the predictive distribution of the abundance index (left panel) and posterior standard deviation of spatial effects (right panel) for hake.

Evaluation of alternative survey designs

In the case of the thornback ray, the survey sampling designs UB, UR, and SimD provided better results for MAE and RMSE criteria (Table 5). Regarding hake abundance, survey designs UBAR, UBSI, URAR, and URSI produced lower MAE and RMSE values than the other designs (Table 5).

Table 5.

Values of the assessment metrics computed for the thornback ray and hake for each proposed measure, to be used in performance evaluation of the eight alternative non-random survey designs.

Alternative designR. clavataM. merluccius
MAERMSEMAERMSE
UB2.4904.783362.057685.102
UBAB3.1956.013349.535663.127
UBAR2.9645.581343.200651.899
UBSI2.7115.110345.017659.000
UR2.5295.090356.433683.993
URAR2.9265.577347.339653.221
URSI2.6235.170347.668668.400
SimD2.5545.055360.706679.771
Alternative designR. clavataM. merluccius
MAERMSEMAERMSE
UB2.4904.783362.057685.102
UBAB3.1956.013349.535663.127
UBAR2.9645.581343.200651.899
UBSI2.7115.110345.017659.000
UR2.5295.090356.433683.993
URAR2.9265.577347.339653.221
URSI2.6235.170347.668668.400
SimD2.5545.055360.706679.771

Grey lines indicate the alternative survey designs that provided the best balance for primary and secondary species. MAE mean absolute error; RMSE root mean square error. Best designs are the ones that promote lower values for MAE and RMSE.

Table 5.

Values of the assessment metrics computed for the thornback ray and hake for each proposed measure, to be used in performance evaluation of the eight alternative non-random survey designs.

Alternative designR. clavataM. merluccius
MAERMSEMAERMSE
UB2.4904.783362.057685.102
UBAB3.1956.013349.535663.127
UBAR2.9645.581343.200651.899
UBSI2.7115.110345.017659.000
UR2.5295.090356.433683.993
URAR2.9265.577347.339653.221
URSI2.6235.170347.668668.400
SimD2.5545.055360.706679.771
Alternative designR. clavataM. merluccius
MAERMSEMAERMSE
UB2.4904.783362.057685.102
UBAB3.1956.013349.535663.127
UBAR2.9645.581343.200651.899
UBSI2.7115.110345.017659.000
UR2.5295.090356.433683.993
URAR2.9265.577347.339653.221
URSI2.6235.170347.668668.400
SimD2.5545.055360.706679.771

Grey lines indicate the alternative survey designs that provided the best balance for primary and secondary species. MAE mean absolute error; RMSE root mean square error. Best designs are the ones that promote lower values for MAE and RMSE.

Designs UBAB, UBAR, and URAR, whose weights consider the standardized estimates of abundance/biomass for one or both species, presented higher MAE and RMSE values for the thornback ray. Indeed, maximizing the biomass and/or abundance indices estimates provided a less accurate prediction process, highlighting the importance of minimizing the standard deviation estimates of spatial effects, which can be seen as a measure of uncertainty in the estimation process. This feature was confirmed by the results for hake abundance index estimates, where designs UR and SimD, which do not consider the uncertainty for hake, presented two of the worst MAE and RMSE results despite the worst results coming from the UB design.

Results confirmed that the survey sampling designs UB, UR, URSI, and SimD resulted in higher accuracy for thornback ray biomass than that obtained with a random selection of fishing hauls (Figure 5). Regarding estimation of the hake abundance index, all sampling designs outperformed the design Rand (Figure 5). Maps of each non-random alternative designs and of a replica of Rand design are provided in Figure S5.

Empirical distribution of MAE (left panels) and RMSE (right panels) densities resulting from the estimation of thornback ray biomass (top panels) and hake abundance (bottom panels) indices. The densities, represented by thin black lines, were approximated based on the 200 simulated designs Rand (last row of Table 1). Vertical lines represent the best alternative surveys (with the lowest exact values of MAE and RMSE) for thornback ray (solid lines) and hake (dashed lines). See Table 1 for the definition and objectives of each alternative design.
Figure 5.

Empirical distribution of MAE (left panels) and RMSE (right panels) densities resulting from the estimation of thornback ray biomass (top panels) and hake abundance (bottom panels) indices. The densities, represented by thin black lines, were approximated based on the 200 simulated designs Rand (last row of Table 1). Vertical lines represent the best alternative surveys (with the lowest exact values of MAE and RMSE) for thornback ray (solid lines) and hake (dashed lines). See Table 1 for the definition and objectives of each alternative design.

Discussion

The aim of the present study was to propose alternative sampling designs for the Portuguese bottom trawl research survey, focusing on the improvement of abundance estimates for one secondary species, while maintaining the precision of the abundance index estimates of one primary species. Although the URSI survey design did not produce the greatest results, it struck the best balance of estimates accuracy between the primary and secondary species. It ranked fourth for thornback ray, third for hake, and outperformed most of the replicas of the Rand design. According to URSI design, the 11 new fishing hauls were selected to minimize the uncertainty not explained by the thornback ray biomass model, giving relevance to specific geographic areas where the species is likely to be more abundant.

Under the present study a four-step framework was developed that included (i) the prediction species abundance indices using a SDM approach; (ii) the construction alternative designs based on the predicted estimates; (iii) the calculation of assessment metrics for each alternative designs; and (iv) the comparison of alternative designs using the assessment metrics.

SDMs allow the combination of species occurrence and/or abundance data with environmental information, incorporate both spatial and temporal variability, and may have either a single or multi-species focus (Thorson and Barnett, 2017; Coelho et al., 2018; Martínez-Minaya et al., 2018; Azevedo and Silva, 2020). Furthermore, the combination of SDMs with geostatistical methods proved to be effective in handling complex data that is spatial correlated (Martínez-Minaya et al., 2018; Pennino et al., 2019; Izquierdo et al., 2022).

We focused on the spatial dimension of the abundance indices for both species, as temporal variation was not found to be significant across the years for the two species analysed, while also taking into consideration the sample size. However, species abundance commonly varies in space and in time (Hefley and Hooten, 2016). As a result, a temporal component, like fixed or random effects either structured or unstructured, should be considered, especially for long time series. Moreover, incorporating appropriate environmental covariates that change over time may help to explain temporal variation in species abundance indices.

The proposed alternative designs were developed using the best performing SDM obtained for each species. Consequently, their effectiveness and adequacy are constrained by the limitations of the data and models used. Our dataset suffers from a small sample size, which inevitably reduces the accuracy of our estimates. Ideally, to provide reliable “ground truth” and eliminate the need for modelling, species abundance indices should be measured across the entire study area and over a long time series. Unfortunately, in our case study, as in many others, financial and logistical constraints prevented us from accessing this ground truth. In fact, our proposed framework could be improved with larger datasets to enhance its performance.

The comparison of alternative designs considered the predicted and observed values for abundance indices in a specific year, 2015. The 2015 survey data were used to build the SDM (R. clavata: 2013–2016; M. merluccius: 2015–2016) and to evaluate the sampling procedure, through assessment metrics. This procedure, that was adopted in our case-study due to the shortage of samples for R. clavata may have biased the results. We note, however, that our study focused on the selection procedure of new fishing hauls and not try to understand how the model fitted the data. Nevertheless, we recommend using disjoint datasets in the two steps (building SDM and evaluating sampling procedure) when the size data are relevantly larger than the one used in our study. Another modelling tools (like machine learning techniques, non-parametric approaches, etc) can be used to get the ground truth, which is the objective of using the SDM.

It is also important to note that the non-random survey designs resulted in greater precision. This could be because the design objective was to minimize uncertainty and the metrics used to evaluate the designs were based on prediction ability, which is closely related to prediction uncertainty. Another noteworthy result was the small differences among the alternative designs when comparing the assessment metrics values. Nevertheless, the small number of new fishing locations as well as the survey condition of requiring at least two hauls per stratum in our case study limited our ability to compare the alternative designs. In fact, augmenting the number of new hauls would result in greater differences and consequently facilitate the adequacy assessment of the proposed alternative designs.

Conclusions

Improving the precision and quality of research surveys is crucial for effective ecosystem conservation and fisheries resource management. In this study, we used spatial modelling techniques to estimate the spatial distribution of primary and secondary species caught in research vessel surveys and evaluated alternative survey sampling designs to improve survey precision.

Despite the limitations of our procedure for proposing survey sampling designs, it represents a significant contribution towards improving species distribution survey data. In particular, the adoption of the URSI design for the PT-IBTS-Q4 survey will lead to an improvement on the accuracy of thornback ray biomass estimates, without jeopardising the estimation of the hake abundance index.

Our analysis focused on two fish species in Portuguese waters but, more importantly, the framework and the methodology can be replicated for other bottom trawl research surveys and sets of species. Furthermore, the proposed framework is flexible enough to incorporate other sampling restrictions and to respond to other sampling objectives.

For further work, we suggest considering financial and time-consuming costs in the sampling weights and defining a larger number of new fishing hauls to perform a fairer comparison among the alternative designs. Additionally, we recommend a cost-efficiency analysis to determine the most suitable sampling design for surveys, which could include comparing survey duration and displacement costs should consider a cost-efficiency analysis. This analysis can easily be accomplished by, for instance, comparing the costs to fishing sites, among other relevant survey costs.

Acknowledgement

PT-IBTS-Q4 survey was carried out within IPMA National Programme for Biological Sampling (PNAB) co-financed by national funds and the European Maritime and Fisheries Fund, EMFF. We thank the survey team that collected the data, Corina Chaves for compiling and performing the survey data quality control. We would like to extend our sincere acknowledgement to the reviewers who dedicated their time, expertize, and valuable insights to review and provide constructive feedback on our work. Their rigorous evaluation and thoughtful comments have greatly contributed to improving the quality and clarity of our research.

Data availability statement

The sampling data will be shared on reasonable request to co-author B. Pereira ([email protected]).

Author contributions

DS, RM, and IF developed the methodology. DS analysed the data and implemented the method. All authors contributed to the interpretation of the results and discussed ideas. DS led the writing of the manuscript, with contribution from RM, BP, MA, and IF. All authors approved the final version of the manuscript.

Conflict of interest

The authors have no conflicts of interest to declare.

Funding

This study was supported by Portuguese funds through the Centre of Mathematics and the Portuguese Foundation for Science and Technology (FCT), within Individual Scholarship PhD PD/BD/150535/2019 and projects UIDB/00013/2020, UIDP/00013/2020, and PTDC/MAT-STA/28243/2017.

References

Azevedo
M.
,
Silva
C
.
2020
.
A framework to investigate fishery dynamics and species size and age spatio-temporal distribution patterns based on daily resolution data: a case study using Northeast Atlantic horse mackerel
.
ICES Journal of Marine Science
.
77
:
2933
2944
.

Borges
M. d. F
.
1984
.
Evaluation of the results on horse mackerel (Trachurus Trachurus L.) of a series of young fish surveys in the Portuguese waters (Division IX)
.
ICES CM
.
26
:
45
.

Cardador
F.
,
Dinis
H.
,
Goñi
R
,
Pereiro-Muñoz
F. J
,
Sánchez
F.
.
1989
.
Report of the Study Group on Recruitment Indices of Hake from the Southern Stock. ICES Expert Group reports, 1989
. https://hdl.handle.net/10508/10986
(last accessed on 21 December, 2022)

Cardador
F.
,
Sánchez-Delgado
F.
,
Pereiro-Muñoz
F. J.
,
Borges
M. F
,
Caramelo
A. M.
,
Azevedo
M.
,
Silva
A
, et al.
1997
.
Groundfish surveys in the Atlantic Iberian waters (ICES Divisions VIIIc and IXa): history and perspectives
.
ICES CM 1997/Y
.
8
:
30
pp.

Casey
J.
,
Pereiro
J
.
1995
.
European hake (M. merluccius) in the North-east Atlantic
.
Hake: Fisheries, Ecology and Markets
. Ed. by
Alheit
J.
,
Pitcher
T.
Chapman & Hall
,
Dordrecht
.
doi: 10.1007/978-94-011-1300-7_5

Chai
T.
,
Draxler
R. R
.
2014
.
Root mean square error (RMSE) or mean absolute error (MAE)?—arguments against avoiding RMSE in the literature
.
Geoscientific Model Development
.
7
:
1247
1250
.

Chaves
C
.
2018
.
Relatório da Campanha Demersal 2018 - 02041018 IBTS PT-PGFS-Q4-2018
.
Relatórios de Campanha
.
45
pp.

Cochran
W. G
.
1977
.
Sampling Techniques
. 3rd edition,
John Wiley & Sons
,
New York, NY
.

Coelho
R.
,
Mejuto
J.
,
Domingo
A.
,
Yokawa
K.
,
Kwang-Ming
L.
,
Cortés
E.
,
Romanov
E. V
, et al.
2018
.
Distribution patterns and population structure of the blue shark (Prionace glauca) in the Atlantic and Indian Oceans
.
Fish and Fisheries
.
19
:
90
106
.

Dulvy
N.
,
Metcalfe
J. D.
,
Glanville
J.
,
Pawson
M. G.
,
Reynolds
J. D
.
2000
.
Fishery stability, local extinctions, and shifts in community structure in skates
.
Conservation Biology
.
14
:
283
293
.

Ebert
D. A.
,
Dando
M
.
2021
.
Field Guide to Sharks, Rays & Chimaeras of Europe and the Mediterranean
.
Wild Nature Press
.
doi:10.2307/j.ctv12sdwkk

Ellis
J.
,
Cruz-Martinez
A.
,
Rackham
B.
,
Rogers
S
.
2005
.
The distribution of chondrichthyan fishes around the British Isles and implications for conservation
.
Journal of Northwest Atlantic Fishery Science
.
35
:
195
213
.

Frank
J.
,
Massey
J
.
1951
.
The Kolmogorov-Smirnov test for goodness of fit
.
Journal of the American Statistical Association
.
46
:
68
78
.

Hefley
T. J.
,
Hooten
M. B
.
2016
.
Hierarchical species distribution models
.
Current Landscape Ecology Reports
.
1
:
87
97
.

Hoef
J. V
.
2002
.
Sampling and geostatistics for spatial data
.
Écoscience
.
9
:
152
161
.

ICES
1990
.
Report of the Working Group on the assessment of the stocks of hake
.
C.M.1990/Assess
.
22
:
171
.

ICES
2002
.
Manual for the International bottom trawl surveys in the Western and Southern areas (revision II). Addendum to ICES CM 2002/D:03 Ref:G ACFM, ACE, 27
pp.

ICES
2005
.
Report of the Workshop on Survey Design and Data Analysis (WKSAD)
.
ICES CM 2005/B:07
,
170pp
Sète, France
.

ICES
2017
.
Manual for the IBTS North Eastern Atlantic Surveys. Series of ICES Survey Protocols
,
15
.
92 pp
. https://doi.org/10.17895/ices.pub.3519
(last accessed on 22 December, 2022)
.

ICES
2020
.
Manual for the North Sea International Bottom Trawl Surveys. Series of ICES survey protocols SISP 10-IBTS 10, Revision 11.
.
102
pp.

ICES
2021
.
Working Group on Elasmobranch Fishes (WGEF)
.
ICES Scientific Reports
.
3
:
822
.

Izquierdo
F.
,
Menezes
R.
,
Wise
L.
,
Teles-Machado
A.
,
Garrido
S
.
2022
.
Bayesian spatio-temporal CPUE standardization: case study of European sardine (Sardina pilchardus) along the western coast of Portugal
.
Fisheries Management and Ecology
.
29
:
670
680
.

Jardim
E.
,
Ribeiro
P. J
.
2007
.
Geostatistical assessment of sampling designs for Portuguese bottom trawl surveys
.
Fisheries Research
.
85
:
239
247
.

Lindgren
F.
,
Rue
H.
,
Lindström
J
.
2011
.
An explicit link between Gaussian and Gaussian Markov random fields: the stochastic partial differential equation approach
.
Journal of the Royal Statistical Society Series B: Statistical Methodology
.
73
:
423
498
.

Machado
P. B.
,
Gordo
L.
,
Figueiredo
I
.
2004
.
Skate and ray species composition in mainland Portugal from the commercial landings
.
Aquatic Living Resources
.
17
:
231
234
.

Martínez-Minaya
J.
,
Cameletti
M.
,
Conesa
D.
,
Pennino
M. G
.
2018
.
Species distribution modeling: a statistical review with focus in spatio-temporal issues
.
Stochastic Environmental Research and Risk Assessment
.
32
:
3227
3244
.

Meyer
S.
,
Warnke
I.
,
Rössler
W.
,
Held
L
.
2016
.
Model-based testing for space-time interaction using point processes: an application to psychiatric hospital admissions in an urban area
.
Spatial and Spatio-temporal Epidemiology
.
17
:
15
25
.

Moura
T.
,
Chaves
C.
,
Figueiredo
I.
,
Mendes
H.
,
Moreno
A.
,
Silva
C.
,
Vasconcelos
R. P
, et al.
2020
.
Assessing spatio-temporal changes in marine communities along the Portuguese continental shelf and upper slope based on 25 years of bottom trawl surveys
.
Marine Environmental Research
.
160
:
105044
.

Pedersen
M. W.
,
Berg
C. W
.
2017
.
A stochastic surplus production model in continuous time
.
Fish and Fisheries
.
18
:
226
243
.

Pennino
M. G.
,
Conesa
D.
,
López-Quílez
A.
,
Muñoz
F.
,
Fernández
A.
,
Bellido
J. M
.
2016
.
Fishery-dependent and -independent data lead to consistent estimations of essential habitats
.
ICES Journal of Marine Science
.
73
:
2302
2310
.

Pennino
M. G.
,
Paradinas
I.
,
Illian
J. B.
,
Muñoz
F.
,
Bellido
J. M.
,
López-Quílez
A.
,
Conesa
D
.
2019
.
Accounting for preferential sampling in species distribution models
.
Ecology and Evolution
.
9
:
653
663
.

Relvas
P.
,
Barton
E. D.
,
Dubert
J.
,
Oliveira
P. B.
,
Peliz
Á.
,
Silva
J. C. B.
,
Santos
A. M. P
.
2007
.
Physical oceanography of the western Iberia ecosystem: latest views and challenges
.
Progress in Oceanography
.
74
:
149
173
.

Roos
M.
,
Held
L
.
2011
.
Sensitivity analysis in bayesian generalized linear mixed models for binary data
.
Bayesian Analysis
.
6
:
259
278
.

Santos
R.
,
Medeiros-Leal
W.
,
Novoa-Pabon
A.
,
Crespo
O.
,
Pinho
M
.
2021
.
Biological knowledge of Thornback Ray (Raja clavata) from the Azores: improving scientific information for the effectiveness of species-specific management measures
.
Biology
.
10
:
676
.

Smith
T. D
.
2002
.
The Woods Hole bottom-trawl resource survey: development of fisheries-independent multispecies monitoring
.
ICES Marince Science Symposia
.
215
:
474
482
.

Spiegelhalter
D. J.
,
Best
N. G.
,
Carlin
B. P.
,
Van Der Linde
A
.
2002
.
Bayesian measures of model complexity and fit
.
Journal of the Royal Statistical Society Series B: Statistical Methodology
.
64
:
583
639
.

Stehmann
M.
,
Bürkel
D. L
.
1984
.
Rajidae
. pp.
Fishes of the North-eastern Atlantic and Mediterrean
.
163
196
.. Ed. by
Whitehead
P.
,
Bauchot
M. L.
,
Hureau
J. C.
,
Nielsen
J.
,
al.
et
UNESCO
,
Paris
.

Thorson
J. T.
,
Barnett
L. A. K
.
2017
.
Comparing estimates of abundance trends and distribution shifts using single- and multispecies models of fishes and biogenic habitat
.
ICES Journal of Marine Science
.
74
:
1311
1321
.

Walker
P. A.
,
Heessen
H. J. L
.
1996
.
Long-term changes in ray populations in the North Sea
.
ICES Journal of Marine Science
.
53
:
1085
1093
.

Walker
P. A.
,
Hislop
J. R. G
.
1998
.
Sensitive skates or resilient rays? Spatial and temporal shifts in ray species composition in the central and north-western North Sea between 1930 and the present day
.
ICES Journal of Marine Science
.
55
:
392
402
.

Zhang
C.
,
Xu
B.
,
Xue
Y.
,
Ren
Y
.
2020
.
Evaluating multispecies survey designs using a joint species distribution model
.
Aquaculture and Fisheries
.
5
:
156
162
.

Zuur
A. F.
,
Ieno
E. N.
,
Saveliev
A. A
.
2017
.
Beginner’s Guide to Spatial, Temporal, and Spatial-Temporal Ecological Data Analysis with R-INLA
.
Highland Statistics Ltd
,
Newburgh, United Kingdom
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Handling Editor: Stan Kotwicki
Stan Kotwicki
Handling Editor
Search for other works by this author on:

Supplementary data