Match of the day: optimized experimental design in alternate-haul gear trials

of the day: optimized experimental design in alter-nate-haul trials. European Union (EU) ﬁshers need a range of gear options to comply with requirements under the landing obligation. Alternative ﬁshing gears may be implemented provided equivalent selectivity can be demonstrated. Catch comparison is a valid method of testing the size selectivity of two or more ﬁshing gears and simultaneous gear deployment helps minimize between-haul spatiotemporal variability in abundance. Non-simultaneous or alternate-haul deployments are generally required for single-rig trawls or seines nets. In those gears, matching consecutive test and control hauls helps minimize such variability. Random-haul matching strategies have also been employed where consecutive deployments are not logistically possible. Here, we investigated the effects of different matching methodologies by simulating a range of stylized scenarios of between-haul variation in abundance. We resampled data from a multi-rig catch comparison trial and emulated consecutive or randomly matched hauls. We examined how haul matching methodology inﬂuences catch curve estimates and uncertainty. Aiming for a balance in abundance across consecutively matched hauls is optimal, while random-haul matching may be the best strategy if neither balanced abundance nor consecutive hauls can be achieved. Based on these outputs, we provide practical guidance for experimental design during planning and at-sea operations to optimize trial outputs.


Introduction
First introduced in 1970 and updated most recently in 2013, the European Union Common Fisheries Policy (CFP) aims to: maintain or restore stocks of harvested species of living marine biological resources in EU waters above maximum sustainable yield by 2020 and gradually eliminate discards by introducing a landing obligation or discard ban (EU, 2013).
To support the aims of the CFP, the EU has implemented a range of regulations, which include: baseline codend mesh sizes and selective gears (EU, 2019a); annual regionalized "discard plans" with fishery-specific gear measures and provision to add alternative gear measures on the basis of equivalent selectivity with existing measures (e.g. EU, 2019b); and remedial measures with further gear requirements for particular species (EU, 2020).
In the light of this dynamic regulatory environment, it is essential that selectivity characteristics of new or alternative gears can be expeditiously and robustly tested to provide fishers with options to effectively meet changes in management.
The 2019 EU technical measures regulation defines selectivity as the probability of capture of marine biological resources of a certain size and/or species (EU, 2019a). For example, size selectivity may be assessed by estimating the size structure of the fish retained in a codend and relating this to the total population encountered (Wileman et al., 1996). This is typically achieved by deploying a small-mesh cover around a test codend or by deploying an additional small-mesh control codend.
The use of small mesh in such experiments can result in overly large catches and biased results by blocking the path of fish escape V C International Council for the Exploration of the Sea 2021. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/ licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. through the test codend (Madsen and Holst, 2002) or damaging the small mesh (Hillis and Earley, 1982). These effects can be mitigated by shortening haul duration which reduces codend catch size, a factor that influences codend selectivity (O'Neill and Kynoch, 1996).
Size selectivity may also be obtained by estimating the probability that a fish of a given length will be retained in a test gear compared with a control gear (Holst and Revill, 2009). Such "catch comparison" trials do not require the use of small mesh and the gear can be deployed for haul durations representative of commercial fishing operations. Hence, catch comparison trials are ideally suited to scientifically robust gear assessment and this method is considered appropriate for testing and implementing new or alternative gears in EU fisheries (STECF, 2017;Calderwood et al., 2021).
Multi-rig catch comparison trials facilitate simultaneous deployment of test and control gears on the same vessel, which greatly assists in minimizing between-haul spatiotemporal variability in abundance (e.g. Browne et al., 2017;Cosgrove et al., 2019). Such simultaneous deployments are generally not possible in the case of single-rig trawls or seine nets, which usually rely on alternate hauls from the same vessel or parallel hauls on two different vessels (Wileman et al., 1996). Alternating test and control hauls is generally less costly and logistically complex and hence, far more commonly employed compared with the parallel method. Also, parallel hauls may not be technically feasible in the case of certain gears such as seine nets. Compared with multirigs, however, alternate hauls are prone to greater between-haul variability. This is particularly evident where a vessel moves ground between deployments  or difficulties and delays associated with gear changeover (Sistiaga et al., 2016).
Regardless of the experimental method used, catches, environmental conditions, and gear selectivity parameters vary between hauls (Fryer, 1991). Methods such as random effects models (Holst and Revill, 2009), mixed model smoothers (Fryer et al., 2003), generalized estimating equations that account for correlations/clustering among observations (e.g. Ward and Myers, 2007), and bootstrapping  have been used to deal with between-haul variation in catch comparisons. In addition to matching consecutive deployments, alternate hauls have been dealt with by randomly matching test and control hauls with bootstrapping (Sistiaga et al., 2016). The relative impacts of these strategies on estimating between-haul variability are poorly understood.
Here, we investigate the effects of different matching methodologies in alternate-haul gear trials. We use a simulation framework to investigate these effects under a range of stock abundance scenarios. We resample data from a multi-rig catch comparison experiment to emulate consecutive or randomly matched hauls. We describe how haul matching methodology influences catch curve (relative size selectivity) estimates and uncertainty. Finally, we provide practical guidance for experimental design and decisions during planning and at-sea operations to optimize trial outputs.

Fishing gear and sampling operations
A twin-trawl (multi-rig) catch comparison trial was carried out in the Celtic Sea (ICES divisions 7.j and 7.g) during May 2019. Fishing gear consisted of identical twin-rigged whitefish hopper trawls configured using triple warps, a pair of otter boards (trawl doors) and a centre clump weight ( Table 1). The test gear comprised a codend and extension piece constructed from 90-mm (nominal mesh size) T90 (turned 90 ) mesh. The control gear comprised a codend and extension piece constructed from 80-mm (nominal mesh size) T0 (diamond) mesh fitted with a 3-m-long 120-mm square-mesh panel (SMP) located between 9 and 12 m from the codline.
Test and control codends and extension pieces were constructed in two panels using 4-mm diameter double polyethylene (PE) compacted twine while the SMP was constructed using knotless nylon twine as is common in the fishery. Codend and extension piece circumference was the maximum 120 meshes round for the control and 79 meshes round for the test gear. Although the maximum circumference for both 80 and 90-mm codend mesh sizes is 120 meshes round, trials of T90 mesh in the same area have shown that a reduced circumference of around 80 meshes round yields positive results in terms of ease of handling and improved selectivity (Browne et al., 2016;McHugh et al., 2019;Robert et al., 2020). The fishing operations during the trial targeted gadoids and approximated normal commercial practice. Total catches were quantified and sorted to species level. The total weight of commercial species was recorded along with a random representative sub-sample where necessary. Total length of commercial fish species was measured to the nearest cm below with raising factors applied to counts if subsampling occurred. Although data were collected for all species encountered, here, cod, haddock and whiting were analysed as they are the key species considered for equivalent selectivity (EU, 2019b). Megrim was also included in this analysis to examine the effects on a flatfish species-T90 has been shown to be less selective than T0 mesh orientation for flatfish (Madsen et al., 2012;Bayse et al., 2016;Browne et al., 2016).

Simulation
A simplified simulation framework was constructed to develop controlled insights on the potential impact of multi-rig, consecutive, and random-haul matching under a range of between-haul abundance variability scenarios. We used the general model framework of Millar and Fryer (1999) whereby we simulated catches based on the abundance of fish contacting the gear, fishing power of the gears and probability of retention given contact with the test or control gear. Our baseline simulation haul matching was a multi-rig deployment, the data from which was then matched consecutively or randomly. Each component of the simulation is described below.

Population
By "population" we refer to the abundance-at length l contacting the combined gear in a given haul h, k lh (hereafter termed "abundance"). We set abundance contacting the gears (i.e. before individual gear effects) between high (1000 contacting fish across lengths) and low (100 fish) depending on the abundance scenario (below). Then, in all simulations, length of fish contacting the gear was randomly drawn from a lognormal distribution with a mean of 30 cm and standard deviation on the log scale of 0.3. While real populations are comprised of a mixture of length-atage distributions (Batts et al., 2019), we use a relatively wide single distribution for simplicity. To explore the impact of a variety of potential at-sea trial outcomes, we implemented five stylized between-haul abundance scenarios (Table 2): (i) "exchangeable" where all hauls (ten matched hauls in total to reflect the real data reported) had high contact abundance representing a trial scenario where the abundance is the same for all hauls; (ii) "balanced" where the first four hauls had low abundance followed by six high-abundance hauls (balance terminology pertains to balance in alternate matching rather than an even number of hauls on both populations) representing a scenario where, for example the vessel moved ground to a higher abundance area after four hauls; (iii) "unbalanced" where the first three hauls had low abundance and the rest had high abundance, again representing a change of abundance on the grounds but with an uneven number of hauls in the lowabundance grounds; (iv) "sequential" where abundance linearly increases from low to high across the hauls, representing for example where abundance increases across the trial such as where the vessel progressively fishes deeper and encounters increasing abundance; and (v) "random" where abundance randomly varies between low and high populations across hauls representing a rare or patchily distributed species.

Fishing power and selection probability
We assumed length-constant and equal fishing power of p ¼ 0.5 (of the fish contacting both gears-half contacted the control and half the test) but note that effects of length-constant fishing power would be the same as those associated with population changes within this framework, as the catch rate of a gear is the product of these and retention probability. While population changes were implemented, the effects of length-constant fishing power changes would be the same in the absence of interactions among effects.
Contact selection curves were: control gear logit(r C (l)) ¼ 0.2 Â (l À 30); and test gear logit(r T (l)) ¼ 0.5 Â (l À 35). Many values of the selectivity parameters are possible, but given the goal of the simulation to develop insights over an exhaustive factorial simulation, we kept the differences between the gears simple but of large magnitude.

Random variability
We sampled the catch-at-length for a given gear from a Poisson distribution (Millar and Fryer, 1999) given by: where j indexes the gear.

Haul matching
Our baseline haul matching uses multi-rig where both gears simultaneously sample the same population. "Consecutive" comprises matching sequential deployments (i.e. control gear in haul one with test gear in haul two, vice versa and so on so that all hauls are matched). "Random" comprises randomly matching control and test hauls (e.g. with ten simultaneous matches there are 10 Â 10 possible random matchings).

Replicates
Models fit to each scenario were the same as those described below for the real data except no sub-sampling of the catch was conducted in the simulation. For each scenario, we ran 1000 simulations and reported the average catch curve and average upper and lower 95% confidence limits per scenario and matching methodology. We tabulate the average confidence intervals at small, mid, and large sizes to demonstrate example precision changes (Table 3). Table 2. Stylised between-haul abundance variation scenarios.

Real trial haul matching
Our baseline analysis comprises the multi-rig matching during the actual gear trial (termed "simultaneous matching" hereafter).
To emulate a consecutive matching methodology, hauls were matched in sequence, e.g. control haul 1 matching with test haul 2 (see Figure 1). To emulate a scenario where it is not suitable to match hauls in sequence, we also randomly matched hauls using methods similar to Sistiaga et al. (2016) (termed "random matching" hereafter). Between-haul variability and sub-sampling ratios were incorporated to estimate a mean curve with 95% confidence intervals. To illustrate how inferences from the data can differ based on how hauls are matched, the generalized additive model (GAM) applied was kept consistent (as explained in the model description section below) through all three matching methodologies. Model development used the count of measured samples in each haul, grouped by length class and split into control and test gear categories. As counts were subsampled, an offset was applied for the proportion of the catch in either codend (Holst and Revill, 2009). Sistiaga et al. (2016) applied a bootstrap (with replacement) to the hauls to randomly match among the observed hauls and within-haul sampling. The method randomly matches hauls, estimates the proportion retained (catch curve), stores the curves, and repeats 10 000 times to estimate the mean curve and 95% bootstrap confidence intervals. Note that while Sistiaga et al. (2016) propose recovering selectivity parameters (e.g. L50 and selection range), here we ran catch comparison analyses.

Non-bootstrap model description
The response data y h;i;j comprised the count of a given species in haul h, length-class i and compartment j 2 fC; Tg (control and test). We modelled the count in the test gear as a binomial generalized additive mixed model.
Linear predictor: where n h,i is the total count of fish in length class i in haul h; p h,i,T is the proportion of the catch retained in the test gear (the catch curve) in haul h, length-class i; s l i ð Þ is a penalized cubic spline function of length with up to five knots (less for random bootstrap depending on number of length categories in random sample; and up to a maximum of ten for the simulated fits) placed at quantiles of length; u h $ N ð0; r h Þ are the haul-level random effects on the log-odds, assumed normally distributed with standard deviation r h ; o h ¼ ln qh;T qh;C is the offset reflecting the proportion q of the mass of the species measured per compartment in haul h. The model was fit for each species by maximum likelihood using the mgcv (Wood, 2011) package in R (R Core Team, 2016).

Simulation
Estimated catch-curves were relatively accurate across methods and scenarios with some systematic departures from the true curve (e.g. Figure 1. Twin-trawl catch comparison experiment: simultaneous (rows) and emulated consecutive haul matching sequence (connected by diagonal lines). Hauls 4 and 12 were invalid and control and test gears were swapped between port and starboard trawls after haul 6.  Figure 2). Greater differences were observed in the precision across scenarios and estimation methods (Figure 2). For simultaneous matching, the uncertainty (average 95% confidence interval) reflects the overall abundance of each scenario: from "A" (ten high hauls), "C" (seven high/three low), "B" (six high/four low) to "E" (five high/ five low on average) (Figure 2). Despite this slight trend, the uncertainties are similar and low for simultaneous deployments (Figure 2 and Table 3). The exchangeable abundance scenario had similar levels of uncertainty across the three matching strategies (Figure 2A), as expected and reflecting that a suitable number of simulations were run. As anticipated, the same low level of uncertainty occurred using simultaneous and consecutive haul matching in the balanced scenario, while uncertainty was substantially greater using random matching that matches across different populations ( Figure 2B and Table 3). Uncertainty was lower under consecutive matching than random matching in the sequential abundance scenario ( Figure 2D), reflecting that consecutive matches are on more similar populations than random matching. Substantially greater uncertainty occurred using consecutive matching in the unbalanced scenario, reflecting differences in abundance across matched hauls. Also, random matching had slightly lower uncertainty compared with consecutive haul matching ( Figure 2E). The greatest uncertainty occurred using consecutive haul matching under the random abundance scenario. Random-haul matching reduced this uncertainty to some extent ( Figure 2E). Proportion retained in test gear Figure 2. Simulation results showing the mean proportion and mean 95% confidence intervals for the three matching strategies (columns) under five between-haul population scenarios (rows). Solid lines denote mean proportion and grey shaded area the mean 95% confidence intervals across 1000 simulations for that scenario. Points represent the true proportions.

Fishing operations and catch data
A total of 12 hauls were completed over 4 days but hauls 4 and 12 were considered invalid as the trawl doors failed to spread sufficiently as reported by gear monitoring sensors. Mean haul duration, towing speed, and depth fished were 03:59 h, 2.8 kt, and 92 m respectively. Length frequencies of cod, whiting, haddock, and megrim display varying numbers by species with haddock dominating the catches overall (Figure 3). Catches of haddock and whiting < Minimum Conservation Reference Size (MCRS) were practically eliminated and greatly reduced for cod in the test 90-mm T90 codend compared with the control 80-mm codend with 120-mm SMP (Figure 3). Catches of megrim < MCRS were similar in both gears and neither retained appreciable quantities of this species (Figure 3). The test gear retained fewer whiting ! MCRS and more haddock ! MCRS compared with the control gear. Catches of ! MCRS cod and megrim were similar in both gears.

Real trial haul matching
Empirically, simultaneous and consecutive matching of hauls had broadly similar patterns in the proportions retained ( Figure 4). No random-haul matching results are shown because inference is based on many such matches. Specific differences are, however, apparent with haddock of $40 cm with greater proportions in the control gear when consecutively matched ( Figure 4B). Broadly similar catches at length occurred across haul matching strategies and species except for megrim where the shape of the proportion retained was qualitatively different being logit-linear for simultaneous matching ( Figure 5A, megrim). Apparent differences are typically not significantly different as approximately judged by the overlapping of the confidence intervals on either side of the equal retention line (e.g. large haddock >60 cm).
Simultaneous haul matching yielded narrower confidence intervals around the mean curve ( Figure 5) for all species indicating reduced uncertainty. Such differences would impact inference as judged by the overlap of the confidence intervals with the equal retention line (e.g. whiting 40-45 cm simultaneous compared to consecutive haul matching, megrim >35 cm simultaneous compared to consecutive and random-haul matching). Consecutive haul matching shows increased uncertainty compared with the simultaneous model. This is attributable to a haul effect related to non-simultaneous deployment of test and control gears, which may confound finer scale inference on the size selectivity of the two gears. For megrim and haddock, we observed a clear pattern of increasing uncertainty moving from simultaneous to consecutive to random-haul matching (Figures 5 and 6, haddock and megrim). For cod, uncertainty was similar between the consecutive and the random-haul matching strategies except for those >60 cm where there were very few fish.
Interestingly, for whiting we observed narrower confidence intervals for the random matching than for the consecutive matching ( Figure 5B and C, whiting). When matching across different whiting abundances in hauls 3 and 5, there is an increase in magnitude of the random effects in the model (Figure 7) as it accounts for the variation in abundance in these hauls. This contributes to relatively greater standard errors at length ( Figure 6) and wider confidence intervals ( Figure 5) in the consecutive compared with the random-haul matching strategy.
A decrease in information when moving from simultaneous to consecutive haul matching is also seen in the distribution of the between-haul variability captured by the random effects (Figure 7). The random effects were typically larger in magnitude for the consecutive matching than for the simultaneous matching (Figure 7).

Discussion
The simulated framework (Table 2 and Figure 2) offers a reference for optimized experimental planning with the flexibility to make informed real-time operational decisions in relation to variable abundance. Simultaneous haul matching associated with multi-rig deployment greatly minimizes uncertainty across all abundance scenarios and should be employed where the gear is representative of the commercial fishery. For vessels engaged in alternate hauls, similarly low levels of uncertainty are evident in the exchangeable high-abundance scenario despite matching strategy. In the case of sequential and balanced-abundance scenarios, consecutive matching is optimal. This matching strategy can occur by default or through informed real-time haul planning: steadily increasing catches may occur as a trip unfolds e.g. through improved fish detection or increasing aggregations. Under this sequential scenario, the skipper is unlikely to move grounds, which naturally facilitates consecutive matching. Supposing relatively high catches occur in two consecutive hauls; following a low catch in the subsequent haul, the skipper is keen to move grounds. A decision can be made to complete a further haul in the same low-abundance area, achieve a consecutive match with the previous haul, and aim for a balanced-abundance scenario with low uncertainty ( Figure 2B). Contingency hauls should be planned to allow for this. Another option may be to drop a haul from the analysis but this approach has costs in terms of losing hard won trial data or potentially biased results.
It may not always be possible to conduct an extra haul in the same area e.g. poor weather or quota exhaustion forces a vessel to change fishing grounds leading to an unbalanced-abundance scenario. In this case, random matching may be the best strategy for reducing uncertainty. Fouled or invalid hauls can also preclude consecutive hauls. In the current study, haul 4 was invalid and the vessel moved grounds. Major variation in abundance across matched hauls 3 and 5 resulted in greater uncertainty in the consecutive compared with the random matching strategy as this was effectively an unbalanced scenario.
Fishing operations may also affect abundance. For example, while spatial and temporal variability between hauls should be minimized to evaluate gear rather than population effects, fishing over the same grounds twice may result in local depletion and incomparable hauls (Rijnsdorp, 2000;Poos and Rijnsdorp, 2007).
Representative sampling is fundamental to fisheries science and fishing gear trials aim to emulate commercial fisheries while minimizing uncertainty. Covered codend experiments reduce spatiotemporal variability but shorter haul durations may lead to biased results. Multi-rig catch comparisons also minimize spatiotemporal variability but may not be representative of commercial fisheries using single-rig gear. Differences in key gear characteristics such as fishing circle, headline height, herding effects, and haul duration influence catchability and gear selectivity. For example, haddock typically rise ahead of the approaching trawl and  often escape over the headline (Main and Sangster, 1981). Small fish swim slower than large fish (Wardle, 1986). Hence, a multirig trawl with a lower headline height is likely to retain proportionally fewer small fish and to have different selective properties compared with a single-rig trawl. Studies that aim to assess the selectivity of single-rig gear should be based on that specific gear. This study demonstrates how uncertainty associated with alternate-haul testing of single-rig gears can be effectively reduced and advances the case for gear-specific assessments of fish selectivity.

Funding
GMIT's contribution was gratefully funded by BIM. This study was funded by the Irish Government and part-financed by the European Union through the EMFF Operational Programme 2014-2020 under the BIM Sustainable Fisheries Scheme. Proportion retained in T90 90mm Figure 5. Overall fitted proportion at length for four key species encountered during trial estimated with (A) hauls matched simultaneously, (B) consecutively and (C) randomly with bootstrapping. The solid curved lines are generated by a GAM and indicate the predicted proportion at length, and the grey area either side represents the 95% confidence intervals. Each point represents the empirical raised proportion over all hauls and its diameter is proportional to the raised count.

Data availability statement
Data are available on request to the corresponding author. The simulation and real data analysis files are available at: https://github.com/mintoc/datalo_public/tree/master/motd_pa per_code.

Acknowledgement
We are grateful to Mick Gillen and the crew of the MFV Foyle Fisher, and John George Harrington for their assistance during the gear trial. We are also very grateful to two anonymous reviewers who contributed greatly to the clarity and quality of the final manuscript.   . Log-odds random effects plots displaying between-haul variation from simultaneous (circle) and consecutive (triangle) haul matching scenarios.