- Split View
-
Views
-
Cite
Cite
Steve J Fleischman, Debby L Burwen, Mixture models for the species apportionment of hydroacoustic data, with echo-envelope length as the discriminatory variable, ICES Journal of Marine Science, Volume 60, Issue 3, 2003, Pages 592–598, https://doi.org/10.1016/S1054-3139(03)00041-9
- Share Icon Share
Abstract
For this side-looking, 200 kHz, split-beam sonar application, echo-envelope length has been shown to be predictive of fish size. In this study, this relationship is exploited to estimate the abundance of (large) chinook salmon (Oncorhynchus tshawytscha) in the presence of (smaller) sockeye salmon (Oncorhynchus nerka). The echo-length to fish-size relationship is too imprecise to ascertain the species of individual fish in the classic sense. However, the frequency distribution of echo-length measurements contains information on the relative abundance of chinook and sockeye salmon. The use of echo-length measurements in a mixture model is explored in order to estimate the proportion of total fish passage that comprised chinook salmon. Inputs to the model include empirical estimates of the length–frequency distribution for each species, parameter estimates from the regression relationship of echo-length to fish-length, and echo-length measurements from individual, ensonified fish. Outputs are estimates of the proportions of chinook and sockeye salmon in the river. The advantages of the mixture-model approach over threshold-based discrimination are discussed. Conditional maximum likelihood and Bayesian versions of the model are described. The method can be generalized to other hydroacoustic measurements, including target strength and other discrimination problems.
Introduction
Attempts to identify species of fish with hydroacoustic data have met with limited success, and often the success has been achieved only with expensive hardware or under restrictive circumstances (Horne, 2000). In particular, narrow-band measurements on individual fish have been rarely used to identify species, and then only in concert with other school-based descriptors and when the species present were vastly different in size or anatomy (Rose and Leggett, 1988; Vray et al., 1990).
Part of the difficulty is that measurements of target strength (TS) can be extremely variable or biased. This is particularly true for side-looking, shallow-water, sonar applications, which are subject to boundary effects (Mulligan, 2000), low signal-to-noise ratios (Kieser et al., 2000), variable side aspects of fish (Kubecka, 1994), and point-source violations (Dawson et al., 2000). In a companion study (Burwen et al., 2003), an alternative class of size discriminators is proposed based on echo-envelope length and range measurements, which rely primarily on characterization of the acoustic signal through time. For side-looking applications, time-based measurements are robust to some of the factors (e.g. fish aspect, point-source violations) that introduce extreme variability to the measurements of peak amplitude. They may, therefore, predict fish size better than TS in some circumstances.
An additional difficulty is that the species of interest may differ only modestly in size. This results in hydroacoustic measurements that overlap, making discrimination difficult with a threshold-based approach. In this article, statistical techniques are described that involve “mixture models” that are especially useful for estimating species composition when the discriminating variable is imprecise or distributions overlap. By modelling the frequency distributions of hydroacoustic measurements as mixtures of distributions due to two or more component species, many of the problems associated with conventional threshold-based discrimination can be avoided.
Echo-envelope length measurements from chinook (Oncorhynchus tshawytscha) and sockeye salmon (Oncorhynchus nerka) in the Kenai River in Alaska, which overlap in size, are used to demonstrate the method. Two methods of estimation are described: a conditional maximum-likelihood (CML) algorithm that can be implemented in a spreadsheet to produce point estimates of species composition; and a more powerful Bayesian version implemented in WinBUGS (Gilks et al., 1994), which produces realistic estimates of uncertainty and has the ability to incorporate auxiliary information on the model parameters. It is proposed that the techniques described in this study may be applicable, with adaptation, to other species-discrimination problems, including those based on dorsal aspect, TS measurements.
Methods
Hydroacoustic data were collected during the period May–June 2001 and 2002 at an established sonar installation on the Kenai River, Alaska (Miller and Burwen, 2002). An HTI Model 244 split-beam echosounder operating at 200 kHz, and a 2.9 by 10° elliptical-beam transducer with a near-field range of 3.1 m were used. Pulses were 0.2 ms long and transmitted at a rate of 11–16 s−1. Echoes were rejected if they did not meet a minimum voltage threshold, equivalent to −35 dB on-axis target.
Recent unpublished work indicates that targets located far from the acoustic axis may suffer a slight negative bias in ELSD. Therefore, only the fish less than 3 dB off-axis were used in the mixture-model analyses reported in this article. These fish comprised 47 and 63% of all fish in the 2001 and 2002 datasets, respectively.
Mixture model
Thus the component distributions fS(y) and fC(y) were functions of the length distributions fS(x) and fC(x) and the linear-model parameters β0, β1, γ, and σ2 (Figure 1). The species proportions πS and πC were the parameters of interest, and two methods were used to estimate them. CML method provided reasonable point estimates and the means to evaluate model fit. A Bayesian method provided estimates of uncertainty and the ability to incorporate auxiliary information. Modelling of fS(x), fC(x), β0, β1, and γ differed depending on the method of estimation.
CML estimates of species composition
The first method, which can be implemented in a spreadsheet, finds the maximum-likelihood estimate of πC (and, therefore, πS=1−πC) conditional on the regression parameters. Species-specific length distributions fS(x) and fC(x) were modelled non-parametrically by re-sampling from observed length data. In this case, length measurements were obtained from a gillnetting project conducted immediately downstream of the sonar site. Length data were paired with hydroacoustic data from the same time periods. In this study, no gillnet size selectivity within species is assumed. Estimates of the regression parameters β0, β1, γ, and σ2 were obtained from tethered-fish experiments (Burwen et al., 2003), and were considered fixed. The method proceeded as follows.
Using Equation (7), the probability of each observed value of y, given πC=pC, was calculated and its logarithm was taken. Log likelihoods were summed across all observed {y}. The value of pC that maximized the log likelihood was the CML estimate of πC.
Bayesian estimates
There are several sources of uncertainty in the mixture-model estimates of species composition previously described. These include sampling error from estimating: (1) fish-length distributions from the netting data; (2) the distribution of the hydroacoustic variable y from the sonar data; and (3) the vector of regression parameters (slope, intercept, species effect, and error variance) from the tethered-fish experiments. There is also potential for bias when regression parameters estimated from tethered fish are applied to free-swimming fish. Although estimates from the CML method could be bootstrapped to provide approximate standard errors, a Bayesian version of the mixture model was implemented instead. Bayesian methods are particularly well suited for assessing uncertainty in complex or unconventional estimators. They also provide a formal way to incorporate auxiliary information on the parameters of the model. The Bayesian mixture model was implemented in WinBUGS (Bayes Using Gibbs Sampler (BUGS); Gilks et al., 1994), available free from http://www.mrc-su.cam.ac.uk/bugs/Welcome.html. For examples of fisheries applications of WinBUGS, see Meyer and Millar (1999), Millar and Meyer (2000), and Harley and Myers (2001).
This is convenient because estimates of the length means {μ}, the variances {τ2}, and even the age proportions {θ} are available from other fisheries-research projects. The overall design was therefore a mixture of (transformed) mixtures. Thus, the observed hydroacoustic data were modelled as a two-component mixture of y, each component of which was transformed from a three-component normal mixture of x. In this case, the subcomponents corresponded to ages, but such a design could also be used as a purely synthetic way to approximate skewed or multimodal length distributions in other applications.
Three linear model parameters were regarded as unknown in the model: the intercept parameter, β0; the difference between sockeye and chinook salmon, γ; and the slope, β1. For the analyses presented in this article, the error variance around the regression was regarded as fixed (σ2=0.432).
Species proportions πS and πC were assigned an uninformative Dirichlet(1,1) prior. Likewise, age proportions {θSa} and {θCa} were assigned Dirichlet(1,1,1) priors. Informative normal priors, based on auxiliary data available from other research projects, were used for the length-at-age parameters.
Based on the results of tethered-fish experiments, informative normal priors were also used for regression parameters in the Bayesian mixture model. Linear statistical models of tethered-fish data reported by Burwen et al. (2003) provided estimates of the regression parameters β0, β1, and γ to construct reasonable prior distributions (Table 1).
. | Mean . | s.d. . | 2.5% . | Median . | 97.5% . |
---|---|---|---|---|---|
Normal priors | |||||
β0 | 2.88 | 0.18 | |||
β1 | 0.0319 | 0.0029 | |||
γ | −0.33 | 0.11 | |||
Week 1 posteriors: 32 fish netted, 89 hydroacoustic targets | |||||
β0 | 2.84 | 0.10 | 2.65 | 2.85 | 3.03 |
β1 | 0.0338 | 0.0027 | 0.0288 | 0.0337 | 0.0392 |
γ | −0.36 | 0.09 | −0.54 | −0.36 | −0.18 |
πC | 0.542 | 0.083 | 0.392 | 0.537 | 0.719 |
Week 2 posteriors: 47 fish netted, 88 hydroacoustic targets | |||||
β0 | 2.81 | 0.10 | 2.61 | 2.81 | 3.01 |
β1 | 0.0330 | 0.0027 | 0.0280 | 0.0330 | 0.0386 |
γ | −0.38 | 0.09 | −0.56 | −0.38 | −0.19 |
πC | 0.518 | 0.102 | 0.334 | 0.512 | 0.735 |
Week 3 posteriors: 52 fish netted, 576 hydroacoustic targets | |||||
β0 | 2.81 | 0.09 | 2.64 | 2.81 | 2.99 |
β1 | 0.0356 | 0.0026 | 0.0308 | 0.0355 | 0.0408 |
γ | −0.30 | 0.09 | −0.47 | −0.30 | −0.12 |
πC | 0.244 | 0.043 | 0.165 | 0.242 | 0.334 |
. | Mean . | s.d. . | 2.5% . | Median . | 97.5% . |
---|---|---|---|---|---|
Normal priors | |||||
β0 | 2.88 | 0.18 | |||
β1 | 0.0319 | 0.0029 | |||
γ | −0.33 | 0.11 | |||
Week 1 posteriors: 32 fish netted, 89 hydroacoustic targets | |||||
β0 | 2.84 | 0.10 | 2.65 | 2.85 | 3.03 |
β1 | 0.0338 | 0.0027 | 0.0288 | 0.0337 | 0.0392 |
γ | −0.36 | 0.09 | −0.54 | −0.36 | −0.18 |
πC | 0.542 | 0.083 | 0.392 | 0.537 | 0.719 |
Week 2 posteriors: 47 fish netted, 88 hydroacoustic targets | |||||
β0 | 2.81 | 0.10 | 2.61 | 2.81 | 3.01 |
β1 | 0.0330 | 0.0027 | 0.0280 | 0.0330 | 0.0386 |
γ | −0.38 | 0.09 | −0.56 | −0.38 | −0.19 |
πC | 0.518 | 0.102 | 0.334 | 0.512 | 0.735 |
Week 3 posteriors: 52 fish netted, 576 hydroacoustic targets | |||||
β0 | 2.81 | 0.09 | 2.64 | 2.81 | 2.99 |
β1 | 0.0356 | 0.0026 | 0.0308 | 0.0355 | 0.0408 |
γ | −0.30 | 0.09 | −0.47 | −0.30 | −0.12 |
πC | 0.244 | 0.043 | 0.165 | 0.242 | 0.334 |
. | Mean . | s.d. . | 2.5% . | Median . | 97.5% . |
---|---|---|---|---|---|
Normal priors | |||||
β0 | 2.88 | 0.18 | |||
β1 | 0.0319 | 0.0029 | |||
γ | −0.33 | 0.11 | |||
Week 1 posteriors: 32 fish netted, 89 hydroacoustic targets | |||||
β0 | 2.84 | 0.10 | 2.65 | 2.85 | 3.03 |
β1 | 0.0338 | 0.0027 | 0.0288 | 0.0337 | 0.0392 |
γ | −0.36 | 0.09 | −0.54 | −0.36 | −0.18 |
πC | 0.542 | 0.083 | 0.392 | 0.537 | 0.719 |
Week 2 posteriors: 47 fish netted, 88 hydroacoustic targets | |||||
β0 | 2.81 | 0.10 | 2.61 | 2.81 | 3.01 |
β1 | 0.0330 | 0.0027 | 0.0280 | 0.0330 | 0.0386 |
γ | −0.38 | 0.09 | −0.56 | −0.38 | −0.19 |
πC | 0.518 | 0.102 | 0.334 | 0.512 | 0.735 |
Week 3 posteriors: 52 fish netted, 576 hydroacoustic targets | |||||
β0 | 2.81 | 0.09 | 2.64 | 2.81 | 2.99 |
β1 | 0.0356 | 0.0026 | 0.0308 | 0.0355 | 0.0408 |
γ | −0.30 | 0.09 | −0.47 | −0.30 | −0.12 |
πC | 0.244 | 0.043 | 0.165 | 0.242 | 0.334 |
. | Mean . | s.d. . | 2.5% . | Median . | 97.5% . |
---|---|---|---|---|---|
Normal priors | |||||
β0 | 2.88 | 0.18 | |||
β1 | 0.0319 | 0.0029 | |||
γ | −0.33 | 0.11 | |||
Week 1 posteriors: 32 fish netted, 89 hydroacoustic targets | |||||
β0 | 2.84 | 0.10 | 2.65 | 2.85 | 3.03 |
β1 | 0.0338 | 0.0027 | 0.0288 | 0.0337 | 0.0392 |
γ | −0.36 | 0.09 | −0.54 | −0.36 | −0.18 |
πC | 0.542 | 0.083 | 0.392 | 0.537 | 0.719 |
Week 2 posteriors: 47 fish netted, 88 hydroacoustic targets | |||||
β0 | 2.81 | 0.10 | 2.61 | 2.81 | 3.01 |
β1 | 0.0330 | 0.0027 | 0.0280 | 0.0330 | 0.0386 |
γ | −0.38 | 0.09 | −0.56 | −0.38 | −0.19 |
πC | 0.518 | 0.102 | 0.334 | 0.512 | 0.735 |
Week 3 posteriors: 52 fish netted, 576 hydroacoustic targets | |||||
β0 | 2.81 | 0.09 | 2.64 | 2.81 | 2.99 |
β1 | 0.0356 | 0.0026 | 0.0308 | 0.0355 | 0.0408 |
γ | −0.30 | 0.09 | −0.47 | −0.30 | −0.12 |
πC | 0.244 | 0.043 | 0.165 | 0.242 | 0.334 |
WinBUGS uses Markov-chain, Monte Carlo methods to sample from the joint posterior distribution of all unknown quantities in the model. Two over-dispersed Markov chains were started for each run and Gelman–Rubin statistics were monitored to assess convergence. Some models exhibited slow mixing and extreme autocorrelation. Therefore, relatively long “burn-ins” of 10 000 or more samples were used. Samples were thinned 10 to 1 thereafter, and at least 10 000 samples per chain were retained.
Results and discussion
Conventional two-class, univariate discrimination involves assigning individuals to one class or another depending on whether or not the value of the discriminating variable exceeds a threshold. When distributions overlap, threshold-based discrimination is subject to bias that becomes worse for species proportions near 0 and 1 (Figure 2). Furthermore, the results are sensitive to fish-size distributions. For instance, in the example illustrated in Figure 2, the number of chinook salmon misclassified as sockeye (number with ELSD<2.7) depends largely on the relative abundance of small chinook, which can change over time. In fact, use of such a threshold by itself does not discriminate chinook from sockeye, but rather large chinook from sockeye and small chinook.
Because the mixture-model approach incorporates information on fish-size distributions, and because it explicitly models the expected variability in hydroacoustic measurements, it is not subject to the above pitfalls. There is no bias against extreme proportions, and the estimates are germane to the entire population of chinook salmon, not just those chinook larger than sockeye. Finally, provided length and hydroacoustic measurements are paired in time, mixture-model estimates of species proportions are unbiased in the presence of temporal changes in fish-size distribution.
CML estimates were generated for 9 weeks of 200 kHz side-looking data collected at the Kenai River in 2001 (Figure 3, 6 weeks shown). In addition, 3 weeks of data from May–June 2002 were analysed with WinBUGS (Figure 4). Summary statistics from the marginal posterior distributions of several parameters are given in Table 1.
The CML method estimates species composition stemming from fixed values of the regression parameters. The parameter values successfully explained the observed ELSD distributions during the first 5 weeks, yielding estimates of chinook relative abundance ranging from 24 to 60%. However, in week 6, it was impossible to obtain a good fit with those same parameters (Figure 3f).
This discrepancy could be explained by a change in the relationship between fish size and ELSD. For example, a 0.3 unit increase in the intercept parameter β0 was sufficient to provide a reasonable fit for the following 4 weeks. Unfortunately, the estimate of chinook proportion can be sensitive to such shifts in regression parameters. Burwen et al. (2003) noted that tethered-fish regression parameters (particularly β0 and γ) often differed between experiments conducted in different years.
Posterior distributions of regression parameters shifted only slightly between the 3 weeks of 2002 data analysed (Figure 4, Table 1). We are encouraged that the Bayesian model appeared to effectively respond to small changes in the relationship between ELSD and fish size. Model fit appeared to be good with parameters set to the posterior means (Figure 4).
One of the key advantages of Bayesian analysis is the ability to incorporate auxiliary information in the form of prior distributions, thereby reducing the breadth of posterior distributions (i.e. improving the precision of estimates). Informative priors for regression parameters and mean length-at-age have been used in this analysis, but there remain other opportunities for exploiting this capability. An obvious example derives from use of the netting data. The species composition of the net catches contains information on πC and πS. Such information could be translated into an informative prior distribution for those parameters, thereby synthesizing species-composition estimates from hydroacoustic and netting sources into one. Parameter estimates can also be updated recursively as more data become available. Thus, posterior distributions from one data set are employed as prior distributions for the next. Such a strategy could be employed with parameters that are constant or change slowly over time. For the model used in this study, candidates include the regression slope β1, the species difference γ, or the age proportions θ.
In summary, for this particular application, mixture models of the frequency distribution of ELSD have been found to provide far better estimates of species composition than those using a TS threshold. Further improvement requires a better understanding of the factors that influence echo-envelope characteristics. The present studies suggest that these factors may include fish-packing density (number m−3), fish behaviour, and signal-to-noise ratio. Their influence on the estimates of species composition reported in this article are currently being investigated.
More generally, we suspect that the modelling approach described in this article may prove useful for other hydroacoustic applications and other discriminating variables, e.g. dorsal-aspect TS studies. Explicit consideration of fish-size distributions and the variability of hydroacoustic measurements allow the extraction of maximal information from the data. Fish-size information is often available from auxiliary sources, although gear selectivity may need to be considered. Estimates from such an approach avoid many of the problems associated with the use of thresholds for species classification. Even when the approach does not appear to work, e.g. the poor fit in Figure 3f, there is great value in being alerted to a situation that probably would have gone unnoticed had a simple threshold been employed. Modelling the frequency distribution provides much more information and insight than would otherwise be available. The Bayesian version, in particular, provides a very powerful and intuitively satisfying way to synthesize information from multiple sources.
The key requirement for successful estimation with mixture models is that the composite distribution shows recognizable modes for at least a subset of the data. Clearly defined modes may even remove the need for empirical estimates of the relationship between the hydroacoustic measurement and fish size, i.e. it may be possible to estimate the regression parameters indirectly without tethered-fish or similar experiments. Other adaptations of this approach are possible. For example, it could be extended to utilize multivariate data or to discriminate between more than two species.
This work was supported in part by the US Federal Aid in Sport Fish Restoration Act. Computer code (Excel®, SAS®, and WinBUGS) for the analyses presented in this article is available from the corresponding author.