Bayesian Design with Sampling Windows for Complex Spatial Processes

Optimal design facilitates intelligent data collection. In this paper, we introduce a fully Bayesian design approach for spatial processes with complex covariance structures, like those typically exhibited in natural ecosystems. Coordinate Exchange algorithms are commonly used to find optimal design points. However, collecting data at specific points is often infeasible in practice. Currently, there is no provision to allow for flexibility in the choice of design. We also propose an approach to find Bayesian sampling windows, rather than points, via Gaussian process emulation to identify regions of high design efficiency across a multi-dimensional space. These developments are motivated by two ecological case studies: monitoring water temperature in a river network system in the northwestern United States and monitoring submerged coral reefs off the north-west coast of Australia.


Introduction
Sampling designs that optimise one or more specified utility functions are valuable in many applied contexts, since they facilitate the intelligent collection of data.That is, an optimal design results in greater information efficiency, reduced sampling cost and improved estimation.However, such schemes remain challenging to develop for spatial processes with complex covariance structures, such as those found in natural arXiv:2206.05369v1[stat.ME] 10 Jun 2022 ecosystems.Moreover, sampling at the exact design point can be difficult in practice within these systems due to practical challenges associated with the collection of samples and accessibility of the sites.In this case, a more flexible sampling scheme that identifies windows with near-optimal design efficiency can be highly useful.
Data collected by environmental monitoring programs are essential for understanding patterns, trends and vulnerabilities in ecological and environmental systems.Here, we focus on river network and coral reef systems, and develop optimal sampling regions given non-standard spatial relationships.The covariance across locations on a branching river is particularly complex, since the network topology is embedded in a 2D terrestrial landscape, with directional water flow, and disparities between flow volume.Furthermore, monitoring programs are notoriously expensive in terms of monetary, human and technological resources (Lindenmayer and Likens, 2010;Roelfsema and Phinn, 2010).
Balancing short-and long-term requirements, as well as practical constraints of data collection can add further complication to the design process (Kang et al., 2016).Our aim is to develop a more flexible design approach within a Bayesian framework while achieving two goals: 1) the determination of a set of optimal sampling locations based on a discrete set of available locations, and 2) the formation of sampling windows within a specified region where difficulties due to access or otherwise may be encountered.
Experimental design problems are commonly viewed as optimisation problems.In the classical framework, optimal experimental designs are often derived using utilities based on the expected Fisher information matrix (e.g.Atkinson and Donev, 1992).Such designs have been shown to depend on assumed values for model parameters, so approaches have been proposed to remove some of this dependence.For example, Pronzato and Walter (1985) optimised a design criterion over a prior probability distribution for the model parameters; the so-called 'pseudo-Bayesian' design approach.While pseudo-Bayesian designs are more robust for the choice of parameter values, the prior information is not utilised in subsequent analysis (i.e.computation of the utility).A fully Bayesian framework provides a unified approach for incorporating prior information and rigorously handling uncertainty about the model parameters and the statistical model when forming a design and analysing the resulting data (see Chaloner and Verdinelli (1995); Ryan et al. (2016)).
Various approaches to finding sampling intervals, so-called windows, are presented in the statistical literature relating to pharmacokinetics (e.g.Bogacka et al. (2008); Foo et al. (2012); McGree et al. (2012)).These intervals provide flexibility in the timing of sample collection, while assuring a high level of information efficiency.An approach for stratified random sampling in a geostatistical simulation study by Lin et al. (2011) uses Latin Hypercube sampling to find the optimal sub-sampling of pre-specified regions (for a given spatial stratification).However, to our knowledge no approaches have been developed to form sampling windows for spatial processes in the statistical literature.
We present a Bayesian approach, using a Gaussian process emulator to approximate the utility surface in high-dimensions.Our method builds on the approach to stochastic optimisation employed in the Approximate Coordinate Exchange (ACE) (Overstall and Woods, 2017) algorithm.We also demonstrate how to derive design efficiency contours using sampling windows, which can then be used to guide highly efficient and flexible sampling of complex spatial processes.The new approach is applied to substantive real-world case studies, described below.

Motivational Case Studies
Example 1: Water temperature is considered a "master" variable in river ecosystems because it affects the metabolic rates of aquatic organisms, life history events related to reproduction, hatching and maturation, and restricts the distribution and abundance of ectothermic species (Isaak et al., 2017).Water temperature is often monitored to understand the impacts of climate change and land management on thermal habitats (Isaak et al., 2016;Jackson et al., 2018) and is also used as the basis for regulatory actions (Todd et al., 2008).The recent proliferation of inexpensive in-situ sensors has dramatically increased the potential for monitoring and data-enabled management decisions (Isaak et al., 2017).Nevertheless, comprehensive monitoring on river networks using in-situ sensors remains costly and complex in terms of balancing short-and longterm priorities.Optimal sensor placement must be assessed when sensors are initially deployed, new sensors are added to an established monitoring program or the number of in-situ sensors must be reduced.In addition, field conditions can be unpredictable; some sites are inaccessible due to steep terrain, a lack of network reception for data streaming, or private landholders refusing access.This motivates two considerations: the design of a global optimal sampling scheme across the river network, and a more targeted approach to identify finer-scale sampling regions that provide near-optimal design efficiency.
In river networks, spatial relationships are characterised in part by their topology (e.g.branching network structure, connectivity).There have been numerous advances in statistical modelling used to describe spatial covariance in data collected on branching river networks based on distance travelled along the river, directionality in water flow and differences in flow volume (Ver Hoef et al., 2006;Ver Hoef and Peterson, 2010;Peterson et al., 2013;Santos-Fernandez et al., 2021).The ability to model this spatial component and more accurately predict conditions throughout a river network often provides a better understanding of habitat suitability (Isaak et al., 2016) and leads to more effective management decisions (Sharma et al., 2021a,b).Frequentist sampling methods for river networks have been developed (Som et al., 2014;Wang et al., 2020), as well as pseudo-Bayesian approaches (Falk et al., 2014;Pearse et al., 2020).However, a fully Bayesian design framework for river networks is currently unexplored.
Example 2: Coral reefs are inextricably linked by the symbiotic relationships formed between corals and the organisms around them (Richmond, 1993).In particular, deep reefs have the capacity to act as refugia and to assist in the recovery of more vulnerable shallow water reefs following a disturbance.However, major challenges remain in understanding the extent of and managing the vast marine estate for Australia, as well as for many other maritime nations.Insights into remote reefs are beginning to emerge due to data captured using new technologies such as remote and autonomous underwater vehicles.With limited resources to invest in research and monitoring, there is a need to optimise in-field activities to collect data that will most efficiently achieve research and management goals.
Hard coral cover is an indicator commonly used to infer the health and condition of a coral reef (AIMS, 2021).In this example, we consider deep reefs, or mesophotic coral ecosystems, with data collected between 12 to 50 metres in depth.The extent to which important ecological processes change along depth gradients in a reef is not well understood and is a critical knowledge gap for developing future reef policies and management practices (Kang et al., 2016).Monitoring coral reef environments often relies on a series of images taken underwater, along a transect line.As such, collecting data from submerged shoals can be costly given the specialised skills of those undertaking the monitoring and the cost of equipment and running a ship.Motivating our second design scenario, we consider the optimal placement of transect lines across a submerged shoal taking into account the spatial nature of the process.In practice, underwater sampling along these lines can be challenging and unpredictable due to water currents.
Thus, a more practical solution would be to define sampling regions, as proposed here, offering more flexibility and assurance in sampling.
The following section provides details about the general statistical modelling used in both case studies, then outlines specific details in each case.In Section 3, the Bayesian optimal design approach is described.Our proposed approach to finding sampling windows is given in Section 4, along with the concept of design efficiency contours.Finally, Section 5 illustrates the results of these two examples and demonstrates the scope of applicability of Bayesian design and sampling windows for spatial processes.

Bayesian Spatial Generalised Linear Models
We start with a general description of spatial linear models and then extend this to models for data collected on river networks and coral reefs.Let d = (d 1 , ..., d γ ) denote a design, that is, a set of locations in the physical or parameter space.Consider a linear mixed model with n × 1 response Y, and the n × p design matrix X of explanatory variables spatially indexed at locations, d, with spatially correlated random effects z ∼ N (0, Σ z ) and independent residuals iid ∼ N (0, σ 2 0 ) yielding a covariance of Σ = Cov[Y] = Σ z + σ 2 0 I where I is the n × n identity matrix and σ 2 0 is the nugget effect.Typically, we express Σ z as a function based on distance h between locations d, and covariance parameters, θ z .We define the set of all parameters as θ = (β, θ z ) .
There are many covariance models that describe how the dependence between observed data at two spatial locations decays with distance.A common example is the exponential function, where the partial sill, σ 2 1 , represents the magnitude of variance and the range parameter, α, describes how fast the covariance decays with an appropriate measure of distance between the locations, h.
The generalised linear modelling (GLM) framework provides a unified approach to analyse data for which some function of the mean response (link function) may vary linearly with the predictors rather than the mean response itself.We denote the linear predictor by Xβ = η = β 0 + p j=1 X j β j .A link function g(•) relates the linear predictor to the mean of the outcome variable g(E(Y|X)) = η.In a GLM, Y is assumed to be generated from a distribution in the exponential family, where the probability density (or mass) function is written as, where for observation i, ψ i is the location parameter, φ is the scale parameter and a i (•), b(•) and c(•, •) are known functions.
Example 1: Model for River Networks Consider data collected on a river network exhibiting complex patterns of spatial dependence due to multi-scale processes occurring within and between the aquatic and terrestrial environments (Peterson et al., 2013).Two points d k and d s on the river network are said to be flow-connected if water flows from an upstream site to a downstream site.They are flow-unconnected if they reside on the same river network and share a common junction downstream, but do not share flow.Following Ver Hoef and Peterson (2010), we estimate autocorrelation between points d k and d s by constructing random variables as the integration of a moving-average function over a white-noise random process.The tails of these moving-average functions are unilateral and spatial autocorrelation only occurs when the tails of the functions overlap.If the tail points upstream, it is referred to as a tail-up model and by construction, these models restrict correlation to flow-connected locations.Stream networks are dendritic and so weights are used to proportionally allocate (i.e.split) the tail-up moving average function at river junctions.Tail-up models are particularly useful for modelling organisms or materials that flow passively downstream (e.g.water pollutants).Conversely, when the tail of the function points downstream it is referred to as a tail-down model.These models allow spatial correlation between both flow-connected and flow unconnected locations and may be more suitable for modelling organisms such as fish that actively move both up and downstream (Peterson and Fausch, 2003).
A mixed modelling approach is often used for modelling river network data since it allows for more complex patterns of spatial dependence to be described within a single model (Peterson and Hoef, 2010).Variance components are used to expand the spatially dependent random variable z into several zero-mean random variables, with correlation structure described using Euclidean (EUC), tail-up (TU) and tail-down (TD) covariance models.As the notation suggests, the general covariance matrix can be formulated by a combination of such covariance models, where h EU C is the Euclidean distance and h H represents the hydrological distance, that is, distance along the river.Note that, distance considered can depend on flow-connectivity of sites and the appropriate covariance function but for brevity we denote this as h H , see The generalised linear mixed model for river networks we consider is of the form, with link function as the identity g(µ) = Xβ and Σ as defined by Equation ( 4) at a collection of locations d on the network.
Example 2: Coral Reef Modelling Estimates of hard coral are generally obtained using a number of images, N , of the reef floor at locations s k collected along a transect.In order to define transect placement, design parameters are introduced: the midpoint of a transect using Easting and Northing coordinates, E j and N j , the angle of the transect, α j in degrees, and the length of the transect, l j in meters (m).Transects can therefore be fully specified by design parameters d j = (E j , N j , α j , l j ) for transect j (see Figure 1).Let s k = (e k , n k ) denote the geographic coordinates of image k specified by a collection of transects d.Another parameter, r j , is introduced to create a transect region.For radius r j > 0, points disperse across the For each image a number, n k , of randomly selected points are placed on image, k, and classified as hard coral, or not.The number of points within an image that contain hard coral, y k , are assumed to have a binomial distribution where q = E(Y|X, θ, z), with spatially correlated random effects z ∼ N (0, Σ z + σ 2 0 I) where the matrix Σ z has entries of the form in Equation (2) or some other covariance matrix.The expected value of coral cover q k on image k leads to the specification of a logistic regression model with link function denoted by g(q) = log(q/(1 − q)), so that (E j , N j , α j , l j ) (left).A transect region with radius r j is shown, where data is collected at the stochastic sample locations ŝk within the region (right).

Bayesian Framework
Let p(θ) denote the prior distribution on θ, and consider the joint posterior distribution, where p(y|d) = θ p(y, θ) dθ = θ p(y|θ, d)p(θ) dθ is the marginal distribution of y.Unless the likelihood and spatial random effect are analytically tractable, that is, z p(y|β, d, z)p(z|θ z ) dz = p(y|θ, d) has a closed form solution, the spatial random effect must be integrated out numerically.We describe the approach taken in the next section.

Bayesian Optimal Design for Spatial Models
Optimal design problems are commonly viewed as an optimisation framework with respect to an experimental aim.A utility function is defined to quantify the worth of a design, which may be to control systematic error (bias), reduce random variation, increase the precision of parameter estimates, make predictions about future observations or discriminate between competing models.Formally, we denote the utility function as u(d, θ, y) which depends on model parameters θ ∼ p(θ) and the data we might observe under that statistical model y ∼ p(y|θ, d) at design points d from the design space D.
We establish the notation for a design as follows, d i = (d 1 i , ..., d γ i ), and use d j i to refer to the jth coordinate of design d i throughout the paper.
The aim is to maximise the expected utility, U (d) = E[u(d, θ, y)] over the entire parameter space Θ and all conceivable data sets Y.The Bayesian optimal design d * can therefore be expressed as, Unfortunately, the integral in Equation ( 8) is analytically intractable for most applications, and indeed the linear mixed-model in Equation ( 1).In practice, a commonly used method is Monte Carlo integration, where the expected utility can be approximated by the empirical mean, that is, with draws from the prior θ (m) ∼ p(θ) and then the likelihood y (m) ∼ p(y|θ (m) , d).In order to accurately estimate Û (d), M needs to be large.In some cases, the median m) , y (m) )) may be preferred to the arithmetic mean in Equation ( 9), since it is less sensitive to tail behaviour when the target distribution is asymmetric.
We explore this comparison further in the following sections.
A commonly used utility function is the Kullback-Liebler (KL) divergence (Kullback and Leibler, 1951) of the posterior from the prior distribution, The intuition behind Equation ( 10) is that a large D KL divergence from posterior to prior implies that the data in y reduces entropy (randomness or uncertainty) in θ and hence the data are more informative at design points d.As suggested by Lindley (1956), this utility can be used in the interest of maximising the expected information gain on the model parameters (or functions of these) when performing experiments at design points d.When both the prior and the posterior distributions follow multivariate normal distributions, that is, θ ∼ N (µ 0 , Σ 0 ) and θ|y, d ∼ N (µ 1 , Σ 1 ), respectively, then the KL divergence becomes where µ 0 , µ 1 ∈ R κ and Σ 0 , Σ 1 ∈ R κ×κ (Duchi, 2007).
Note that in Bayesian design, the utility is always a function of the posterior distribution, which generally means that inference is performed many thousands of times since the posterior p(θ|y, d) must be evaluated for each future data set {θ (m) , y (m) } that is drawn from the joint distribution p(θ, y|d).As a computationally efficient approximation to the posterior distribution, we employ the Laplace approximation which has the following form: where θ * = arg max θ∈Θ {log p(y|θ, d) + log p(θ)} and H(θ * ) is the Hessian matrix defined as: This approximation requires evaluation of the full data likelihood.Thus, the full data likelihood used in forming the Laplace approximation is replaced with a numerical approximation to the posterior for θ, Finally, an approach for the maximisation required in finding the Bayesian optimal design in Equation ( 8 with large B and random draws {θ (l) , y (l) } from p(θ, y|d).However, this acceptance criteria relies on the assumption of a normally distributed utility (at least approximately).
We propose an alternative version of the stochastic Coordinate Exchange algorithm (see Appendix Algorithm 2) with a non-parametric acceptance criterion defined as where W is the p-value of the one-sided Wilcoxon rank sum test (equivalent to the Mann-Whitney test).In this case, the null hypothesis is that the distributions of the proposed design utility, U (d j ), and the current best design utility, U (d), differ by a location shift of µ = 0 and the alternative hypothesis is that µ > 0. The 'one-sided' specifies that U (d j ) is shifted to the right of U (d).We explore the proposed non-parametric stochastic Coordinate Exchange algorithm in comparison to the ACE in the Appendix, both with median and mean central tendency measure.

Sampling Windows
As mentioned previously, optimal design methods have been used to determine sampling windows for experiments in pharmacokinetics (Foo et al., 2012;Bogacka et al., 2008).For spatial processes, sampling windows would allow the experimenter some flexibility in the sampling location, while preserving a minimum required efficiency for the design utility.
In order to define sampling windows, we define the utility in a continuous domain.
Consider the design space D ⊆ R q across q windows.We first introduce Gaussian processes (GP) and illustrate their flexibility and capability as a multi-dimensional utility smoothing technique.Based on the smoothing GPs provide, we then show how design efficiency contours can be derived.
GPs generalise the concept of Gaussian distributions over discrete random variables to the function space.In the Bayesian context, GPs can be seen as a prior over the function space, where inference takes place.Let f (x) denote the target value of interest where x ∈ R q is a q-dimensional random vector containing predictor values such as spatial locations.We write GP(m(x), k(x, x )) to denote a GP characterised by a mean function m(x) and a covariance function k(x, x ).Consider a GP model with kernel, with hyperparameters ζ and distance measure defined as dist j (x p , x s ) = |x j p − x j s |, for j = {1, ..., q}.Using a finite collection of inputs {x 1 , ..., x n } and the function above, we construct the kernel matrix K(•) over the parameter space.
Details of the proposed Sampling Windows Algorithm are presented in Algorithm 1, using design notation.We begin by defining a design space for windows (line 2), this may differ from the design space searched in the discrete stochastic CE algorithm.We fit a GP (line 4) to the criterion of interest, here, the median utility Ũ (d i ) computed at each input d i (line 3), assuming a zero-mean GP prior, f ∼ GP(0, K(D, D) + ζ 0 I).
Inclusion of the nugget ζ 0 > 0 ensures that every f has some magnitude of variance with itself only (independent noise), so that the Monte Carlo approximations of the median utility surface will be smoothed, not interpolated. where The posterior predictive mean of f used as an emulator (in line 6) is, for arbitrary inputs d derived using standard results on the conditional distribution of normal random variables.Finally, computing the design efficiency (line 7) is achieved by normalising the functional utility to where d j = arg max

Example 1: River Network Design
The design problem in this case study is motivated by a costly and complex balance of priorities, that is, short-term interest in monitoring particular river segments (e.g.wild-  (Shreve, 1966).The optimal and non-optimal design locations using the proposed stochastic Coordinate Exchange search are represented by filled and unfilled dots, respectively.
fire impacts on juvenile fish habitat), while maintaining the ability monitor efficiently at established locations throughout the region.In this example, we use a water temperature dataset from the Clearwater River Basin, USA, with observation sites located from 500m up to 46km apart (Figure 2).Temperature data were collected at 15 spatial locations using in-situ sensors that recorded measurements at 30 minute intervals (Isaak et al., 2018).The response variable is mean temperature for July 1, 2013, with stream slope, elevation, watershed area, and air temperature used as covariates (Santos-Fernandez et al., 2022).Note that spatial variation in temperature often constitutes the largest proportion of total variation, since temporal correlation among sites is strong (Isaak et al., 2017).A tail-down covariance function was found to be the most suitable covariance function for describing this spatial dependence.
There are 15 existing monitoring sites in the Clearwater River Basin and the design problem is to find the best 5 among them.In addition, wildfires occurred near two segments that provide cold water refugia for juvenile fish and sensors are to be deployed to monitor impacts on water temperature.For convenience, we define search neighbourhoods along the river network, that is, any continuous line along the branching river network and specify Neighbourhood 1 (N1) and Neighbourhood 2 (N2) in Figure 4.
The stochastic Coordinate Exchange approach outlined in Algorithm 2 (Appendix) was computed with K = 5 random starts and T = 15 outer iterations.Figure 5.1 shows a selection of the utility distributions found in the proposed search algorithm, then the distributions of the optimal design, noting the non-normality of these distributions.The optimal design was found to be at locations, d * = (167,169,172,174,183) with a median utility of 3.6805.These optimal locations cover a wide range for each of the covariate values, in particular, optimal locations include the maximum and minimum value for air temperature.The set of locations (167,169,174) are clustered at a river confluence and the most downstream location of all sites, 169, is included in the design (Figure 2).These characteristics agree with optimal designs found using previous pseudo-Bayesian approaches for river networks Falk et al. (2014).We also benchmark the performance of the proposed Coordinate Exchange algorithm against other variations; a comparison of acceptance criteria (Wilcoxon vs. ACE), as well as central tendency measure (mean vs. median) is provided in the Appendix.The choice of the median as a preferred measure is supported by the apparent skewness and long tails of the distributions of the utilities (Figure 5.1).
Recall the specification of neighbourhoods N1 and N2 within the river network for sensor deployment.The proposed Sampling Windows Algorithm (Algorithm 1), is applied.Consider the design space D ⊆ R 2 with design points The current design d c is set as the optimal points d * from the downsizing regime previously found.In Phase I, the median utility is computed as Ũ (d i |d c ) = Ũ (d i ∪ d c ) for each d i in the grid, D, sampled from neighbourhoods 500m apart.Covariate values at d i for air temperature are interpolated using 3 nearest neighbours based on Euclidean distance, whereas the remaining covariate values (stream slope, elevation and watershed area) are unique to each river segment.In Phase II, a GP is fit assuming a kernel function defined by Equation ( 13), where the distance measure , 2 is equivalent to stream distance (along dimension j).Results from Phase III indicate that N1 is more sensitive to location placement than N2, as shown in Figure 5.
From an applied standpoint, these design outputs allow decisions to be made in the field (e.g.placement around specific location n 15 in N1, but anywhere convenient on the main branch in N2).This provides support for greater flexibility in sensor placement when access issues arise, without having to rerun a full discrete optimal design search.
A practical approach, given the design efficiency contours as a roadmap, would be to visit N1 first; should n 15 be inaccessible then move along the river to deploy the in-situ sensor where possible, say n 14 .Take the line segment corresponding to ef f (D * |n 14 ) to inform deployment in N2, in this case, n 24 , to retain 98% optimal utility.

Coral Reef Design
The Australian Institute of Marine Science (AIMS) has monitored the condition of coral reefs around Australia for decades.We considered data collected in 2013 from the Barracouta East submerged coral reef off the north-west coast of Australia (Heyward  et al., 2013).The design problem is to find the best regions to sample underwater images in.We propose a two step approach using the methods described above: first, a global search for the optimal transect lines with n = 3, the total number of transects, using the proposed stochastic CE algorithm in a discrete design space; then, a continuous search for the optimal radius of each transect region with q = 3, the number of sampling windows centred at each transect.
Coral cover is assessed using images of the reef floor taken every 5m along a transect, where n k = 20 random points on image k are analysed, leading to the consideration of the logistic regression model described in Equation ( 6).Exploratory analyses showed a strong linear relationship between the response variable, q, and depth of the reef floor which was used as the only covariate.Note that depth was interpolated using 3 nearest neighbours based on Euclidean distance from the survey data.To reduce the design space and thus the complexity of the search, angles are discretised to α j = {0, 45, 90, 135}, E j and N j are defined by a discrete set of points.Length remains fixed at l j = 500 so that each transect compromises of 100 images.
The proposed Stochastic Coordinate Exchange algorithm with the Wilcoxon acceptance criteria and median utility (Algorithm 2) was applied to find the optimal transect lines across the shoal.When computing the utility, the Laplace approximation to the posterior requires evaluating the full data likelihood, as shown in Equation ( 12).Accordingly, the spatially correlated random effect, z, needs to be integrated out.For this, we chose to use Monte Carlo integration with M z = 50 draws from the marginal likelihood due to computational restraints.The Monte Carlo integration for the median utility in Equation ( 9) is evaluated with M = 350 and B = 600 draws from the prior and full likelihood for the evaluation of Û (d j ) and p * W , respectively.
Spatial covariance is characterised by exponential decay, defined in Equation (2).
To further ease the computational burden, we construct a spatial grid (10 × 10) over the entire shoal, with h EU C the Euclidean distance between grid centre points (a lowerdimensional surrogate to the N × N distance matrix using coordinates x k ).This grid size is a compromise; not so small that the computational gains are marginal, but not so large that the spatial relationships are not captured between grid rectangles.Results in Figure 6 show the optimal placement of transects d * = (d 1 * , d 2 * , d 3 * ) in the shallower areas of the reef, each with varying depth gradients.Thus, it would seem reasonable to sample in along these transects to estimate coral cover.
The proposed Sampling Windows Algorithm (Algorithm 1) is applied over the parameter space r, with the kernel function defined by Equation ( 13) and distance dist j (r s , r p ) = |r j s − r j p | for j = 1, 2, 3.In the first step, the optimal transect design previously found is fixed, while the design space D ⊆ R 3 is constructed for radius parameter, r, corresponding to each transect.Since stochasticity is introduced in the choice of geographic locations, ŝk , we average median utility across 3 location samples for a given radius, each with M = 600 Monte Carlo draws from the prior and likelihood.The 3-dimensional design contours are shown in Figure 7. Optimal design efficiency, ef f (D * ) = 101%, is achieved at r * = (44, 0, 44).This implies optimality when sampling within a region for the shallower areas, and more strictly across the transect d 2 * in the deeper area of the reef.Generally, these contours can be used to guide decision making on the field, for example, sample d 2 * when conditions are good, whereas sampling in d 1 * and d 3 * can be done with stronger underwater currents.

Discussion
Coral reefs and freshwater ecosystems are biodiversity hotspots that are particularly sensitive to anthropogenic impacts of climate change, water pollution and over-exploitation, among other factors, resulting in rapid population declines (Strayer and Dudgeon, 2010;Tickner et al., 2020;Wagner et al., 2020).Informative data are needed to support critical and timely conservation decisions, but various practical constraints make precise sampling difficult or even impossible.In this paper, we have considered Bayesian optimal design for collecting data where greater flexibility is needed in the design outputs.To this end, we proposed a spatial Sampling Windows Algorithm using Gaussian process emulation to estimate design efficiency contours.While we focused on two important environmental systems, such an approach is also applicable to a broad range of complex systems that require rigorous design.For example, optimal design methods for subsampling can be used as a computationally efficient way to analyse Big Data (Drovandi et al., 2017).Rather than extracting optimal design points from the Big Data set, which may or may not exist, points could be selected within windows.
There are several novel aspects to the approaches described here.Statistical models for branching river networks use distance travelled along the river network, directionality, and flow volume to describe spatial relationships.There is novelty in the application of Bayesian design for river network systems, as of yet to be presented in the statistical literature.Since utility distributions were not normally distributed, we also proposed the use of a non-parametric acceptance probability in the stochastic Coordinate Exchange, based on the one-sided Wilcoxon Rank Sum Test.We successfully demonstrated how Bayesian design works in complex spatial settings and introduced a method to find sampling windows, facilitating intelligent data collection.
The novel methodological advances presented here provide practical solutions to common monitoring challenges.Results from the proposed Sampling Windows Algorithm highlighted in the river network case study showed that design efficiency was best achieved around a smaller range in Neighbourhood 1, while almost agnostic to choice in Neighbourhood 2; recalling that neighbourhoods are any continuous line along the river.
In particular, the design efficiency contours identify which, and quantify how much, design points are sensitive to precise sampling.Similarly, results in the coral reef case study showed high design efficiency around regions in shallow areas of the reef compared to deeper regions with less hard coral.Providing fine-scale information about optimality allows practitioners in the field to make informed judgements on jointly optimal variables based on physical conditions not included in the utility (e.g.accessibility, safety, etc.), without the need to re-run time consuming optimisations.
There are also numerous opportunities for future research in this space.One useful extension would be to incorporate additional monitoring objectives related to cost into the utility function.For example, the optimal deployment of a combination of high-cost sensors that produce high-quality data and lower-cost sensors that produce less reliable data.Sampling and monitoring costs are a core consideration for any monitoring program, but it is especially true in remote locations.In such cases, it would also be useful to consider the travel time between sampling locations in the optimal design.New Bayesian spatio-temporal models for river networks (Santos-Fernandez et al., 2021) provide the opportunity to extend the approach developed here for spatio-temporal sampling windows.This would allow practitioners to make more effective use of mobile in-situ sensors for spatio-temporal data collection.Thus, the novel methods presented here (along with future extensions) facilitate more efficient data collection; providing insights into spa-tial (and spatio-temporal) processes which can be used to better inform data-enabled management decisions in complex and vulnerable ecosystems.

A.1. Covariance on River Networks
Again, two points d k and d s on a river network are said to be flow-connected if they share water flow, and flow-unconnected if they reside on the same network and do not share flow.The tail-up covariance between locations separated by stream distance h H can then be defined as follows: and C EU C are matrices derived from the above covariance functions, respectively.

A.2. Stochastic Coordinate Exchange Algorithm
In Algorithm 2, an initial random design d 0 = (d 1 0 , ..., d γ 0 ) is chosen.In the inner-most for loop, coordinate j in the design d is swapped for a point in the design space D \ d, denoted by d j .With each swap, the median utility is computed.Then, the best found utility Ũ (d j ) is compared to the current best utility U best via the acceptance criteria.If accepted, the best design is updated.There are K random starts to mitigate convergence  (167,169,172,174,183) to a local maxima, which can be computed in (embarrassing parallel) fashion.In our example, the Wilcoxon and ACE acceptance criteria resulted in the same optimal design, with respect to both the mean and median utility CE algorithm (Table 16).It is noted that with more uncertain priors or multi-modal utilities, the comparative results between acceptance criteria may be more pronounced.Both optimal designs, and indeed most of the best designs for each random start, contain the points at confluence (167,169,174).
A.3.Median Utility: MCMC vs. Laplace In general, the form of the posterior is unknown, and therefore difficult to estimate directly.In such cases, algorithms like Markov Chain Monte Carlo (MCMC) (e.g. Metropolis-Hastings algorithm) are employed to draw samples which are used to approximate the posterior distribution (Hastings, 1970).
To efficiently approximate the expected utility in Bayesian design, fast methods for approximating the posterior distribution are required.For this, we consider the Laplace  2020).There are some advantages in using the Laplace approximate method compared to numerical Markov chain Monte Carlo (MCMC) methods.Besides the huge gains in computational efficiency for lowdimensional design problems, there is no issue of choosing a suitable proposal density, for which to tune for a desired acceptance rate, and similarly no need to ensure convergence of a Markov chain, as well as no need to determine the length of a suitable "burn-in" phase.However, the Laplace approximation is limited to the assumption of an approximately normal target distribution, has been shown to underestimate posterior variance (Shun, 1997).
Therefore, we compared the median utility, Ũ (d), using the Laplace approximation vs. an MCMC (Metropolis-Hastings algorithm) approximation to the posterior for 50 random designs.
Santos-Fernandez et al. (2021) for further details.See Appendix A.1 for the specification of the covariance matrices C EU C , C T U and C T D , as well as distance h H .

Fig. 1 .
Fig. 1.Data is sampled at each geographic point s k along the transect defined by d j = ) needs to be specified.It is common to use a Coordinate Exchange (CE) algorithm to search a discrete space, by exchanging coordinates in the design, d, with other points in the design space, in a sequential manner.Formally, denote a proposal design by d j = (d 1 , ..., d j−1 , d , d j+1 , ..., d γ ), obtained by exchanging the jth design coordinate for some other design point d ∈ D \d.Given the stochastic nature of the utility approximation, illustrated by draws of the prior and likelihood in Equation (8), the search algorithm needs to accommodate different realisations of Ũ (d).For this, an acceptance probability p * can be used to compare a proposed design d j to the current best design d.The Approximate Coordinate Exchange (ACE) algorithm (Overstall and Woods, 2017) is a commonly used stochastic CE algorithm with acceptance probability defined by Hyperparameters ζ = (ζ 0 , ..., ζ κ ) are determined (line 5) by minimising cross-validation error of the GP, ζ = arg min ζ CV (ζ|f, D), with (14)

368
j |D, ζ).Output design efficiency contours (line 8) as the set of prediction points above threshold value t.Algorithm 1: Sampling Windows Algorithm 1 If existing, let d c = (d c1 , d c2 , ..., d cγ ) be the current design.Specify Ũ (d|d c ) as appropriate and define prior p(θ) and likelihood p(y|θ, d).Phase I -Construct q-dimensional utility grid 2 Define design space D ⊆ R q across q windows, sample D = (d 1 , ..., d γ1 ) with d i ∈ D for training and D = (d 1 , ..., d γ2 ) with d j ∈ D for predictions.Compute median utility Ũ (d i ) for each point d i Phase II -Functional utility surface 4 Fit a GP; hyperparameters are determined by minimising cross-validation error 5 ζ = arg min ζ CV (ζ|f, d) with CV defined in Equation (15) Evaluate predicted mean values f at each prediction point d j as f (d j |D, ζ) = γ1 i=1 k(d j , d i | ζ)α i for α = [K(D, D) + ζ0 I] −1 Ũ (D|d c ) Phase III -Design Efficiency contours 7 Normalise utility surface to yield efficiencies ef f (d j ) as per Equation (17) for each prediction point d j Output set of points where ef f (d ) > t for threshold values t = [0, 1]

Fig. 2 .
Fig. 2. A total of 15 observation sites located in the Clearwater River Basin, USA.Average daily water temperature for July 1, 2013.The width of the stream line is proportional to Shreve's stream order(Shreve, 1966).The optimal and non-optimal design locations using the proposed

Fig. 3 .
Fig. 3. Trace plot for the coordinate exchange algorithm using Wilcoxon acceptance criteria and median utility (top-left); proposal probability using Wilcoxon acceptance criteria, p * W (bottomleft); every 50th utility distribution in the proposed CE algorithm (top-right); and the optimal utility distribution (bottom-right).

Fig. 4 .
Fig. 4. Existing sites (red dots) and potential sites within search neighbourhoods N1 and N2 (gray dots) located on the river network.A discrete set of points were sampled from both neighbourhoods spaced at 500m stream distance apart.

Fig. 5 .
Fig. 5. Phases of the proposed Sampling Windows Algorithm; constructed median utility grid (left), Gaussian process mean predictions (middle), and normalised contours of design efficiency (right).A sensor close to n 15 ∈N1 will yield high design efficiency, while more flexibility can be afforded for the choice of sensor placement in N2.

Fig. 7 .
Fig. 7.The median utility (left); Gaussian process mean predictions of median utility (middle), and normalised contours of design efficiency (right) found at each phase of the proposed Sampling Windows Algorithm.The radius parameter, r j , specifies a radius for transect d j * for j = {1, 2, 3}.

Algorithm 2 :416
Stochastic Coordinate Exchange Algorithm 1 Specify Ũ (d) as appropriate.Define prior p(θ) and likelihood p(y|θ, d).Define design space, D, as set of points with Γ = |D|. 2 for k = 1 : K do 3 Initial random design, d 0 Set d k = d 0 and evaluate U best = Ũ (d k ) with the jth coordinate exchanged with the ιth design point not in d k p * W = 1 − W (max( Ũ (d j k )), U best ), with larger B 12 If accepted, d k = arg max Ũ (d j k ) and U best = max( Ũ (d j k )) Evaluate Ũ (d k ) with largest B k , d * = arg max Ũ (d k )

Fig. A. 8 .
Fig. A.8. Stochastic Coordinate Exchange Algorithm trace plots using ACE acceptance criteria (left) and the proposed Wilcoxon acceptance criteria (right) for the expected KL divergence utility.
approximation to the posterior distribution, see Long et al. (2013); Overstall et al. (2018); Carlon et al. (2020); Senarathne et al. ( Figure A.9  shows the linear relationship between the Laplace and MCMC inference methods for the KL-divergence utility in the design problem outlined in Section 5.1.

Fig. A. 9 .
Fig. A.9.The median utility, Ũ (d), computed for 50 random designs using inference methods: MCMC (Metropolis-Hastings algorithm) vs. Laplace approximation.The optimal design found via the proposed stochastic Coordinate Exchange algorithm (Algorithm 2) is shown in blue.
unconnected, where W ks represents the spatial weights between d k and d s defined by the branching structure of the river network and C(h H |θ zT (Ver Hoef et al., 2006) exponential covariance function defined in Equation (2).The tail-down covariance between locations separated by stream distance h is defined by the exponential covariance function C T D (h H |θ zT D ) = C(h H |θ zT D ) noting that h H is the hydrologic distance separating d k and d s .Similarly, the covariance between locations based on Euclidean distance is C EU C (h EU C |θ zEUC ) =C(h EU C |θ zEUC ).Note that, the exponential model is the only known covariance function that produces a statistically valid covariance matrix when Euclidean distance is replaced with total hydrologic distance(Ver Hoef et al., 2006).In matrix notation, C T U , C T D

Table 1 .
Optimal design scenarios for the river network case studyAcceptance criteria Utility measure Optimal design