The Baltic Sea scale inventory of benthic faunal communities

Mayya Gogina1*, Henrik Nygard2, Mats Blomqvist3, Darius Daunys4, Alf B. Josefson5, Jonne Kotta6, Alexey Maximov7, Jan Warzocha8, Vadim Yermakov9, Ulf Grawe1, and Michael L. Zettler1 Leibniz Institute for Baltic Sea Research (IOW), Warnemunde, Seestr. 15, 18119 Rostock, Germany Marine Research Centre, Finnish Environment Institute SYKE, PO Box 140, FI-00251 Helsinki, Finland Hafok AB, 179 61 Stenhamra, Sweden Coastal Research and Planning Institute, Klaipeda University, H. Manto Str. 84, LT-92294 Klaipėda, Lithuania Department of Bioscience, Aarhus University, Frederiksborgvej 399, 4000 Roskilde, Denmark Estonian Marine Institute, University of Tartu, Maealuse 14, 12618 Tallinn, Estonia Zoological Institute Russian Academy of Science, Universitetskaja nab. 1, 199034 St Petersburg, Russia National Marine Fisheries Research Institute, ul. Kollątaja 1, 81-332 Gdynia, Poland Latvian Institute of Aquatic Ecology, Daugavgrivas Str. 8, LV-1048 Riga, Latvia *Corresponding author: tel: +49 381 5197 393; fax: +49 381 5197 211; e-mail: mayya.gogina@io-warnemuende.de


Introduction
In the recent decades, there has been a marked increase in benthic habitat mapping, largely inspired by the development and implementation of various research and monitoring programmes, management plans, and legislations, such as the 1992 EU Habitats Directive. This has resulted in an exponential accumulation of data hitherto stored as different non-standardized national datasets. Here for the first time, we bring much of these datasets together and harmonize them to address management issues on the large Baltic Sea scale. Similar large-scale efforts were previously reported for the North Sea in Rees et al. (2007), for the Barents Sea in Anisimova et al. (2010).
The focus of our study was identification, description, and basin scale mapping of major benthic macrofauna community distribution in the Baltic Sea. A number of works were previously published, describing and mapping the benthic macrofauna communities in different subbasins of the Baltic Sea (e.g. Warzocha, 1995;Laine, 2003;Glockzin and Zettler, 2008;. However, we identified a lack of joint data analyses covering the entire Baltic Sea region based on new data collected since 2000. At the beginning of the last century, Petersen (1913) introduced the concept of infaunal communities for the description and quantification of areas with similar fish food. He aggregated the infaunal species in relation to broad geographic areas of relatively uniform sediments, depth, and salinity. Petersen (1913) described various soft sediment benthic macrofauna assemblages in different regions of the Baltic Sea. The scale of the investigated area and its heterogeneity allowed for high resolution of classification detail, and the definition of a number of communities or assemblages. Summaries on the distribution of fauna in the Baltic Sea can be found in works of early investigators such as Hessle (1924), Välikangas (1933), Ekman (1933Ekman ( , 1935, and Segerstråle (1957). Based on the findings outlined by Zenkevitch (1963), Schiewer (2008) listed the most important Baltic Sea coenoses or assemblages of species. These were (i) Abra alba-coenosis with Corbula gibba, Arctica islandica, Lagis koreni, Nephtys spp., Diastylis rathkei, and Ophiura albida which predominate in the western Baltic. Towards the east (ii) Arctica-Astarte assemblages were increasingly found. In the shallow part of the Baltic Proper (iii) Macoma baltica coenosis dominated, whereas in the deeper part (iv) Macoma calcarea coenosis (with Mya truncata and Astarte) occurred. Farther in the eastern soft-bottom areas (v) Saduria-Monoporeia-Pontoporeia coenosis with euryhaline species were found, and Gotland Deep, strongly affected by oxygen depletion, was colonized only by (vi) a species-poor hypolimnic community. One of the deepest basins in the Baltic Sea, the Landsort Deep, was characterized by long-term azoic, laminated sediments (Schiewer, 2008). Elmgren (1978) gave a description of the structure and dynamics of the Baltic Sea benthic communities from the Bothnian Bay down to the Baltic Proper and summarized habitats as "an almost uniquely simple benthic ecosystem". Carman and Cederwall (2001) provided an analysis of mean biomass data for the main subbasins of the Baltic Sea and attributed dramatic changes of benthic communities during the late 1960s to oxygen depletion, resulting in extinction of macrofauna in the deepest parts of the Baltic Proper.
Based on the updated data compiled in the HELCOM checklist, 2035 macrozoobenthic species are currently known to live in the Baltic Sea (HELCOM, 2013a). The species composition and diversity in the Baltic Sea are heavily shaped by the large salinity gradient ranging from 2 in the Bothnian Bay to over 20 in the Danish Straits. Zettler et al. (2014) reported a continuous decline in overall species numbers with decreasing salinity levels due to decreasing number of species with marine affinity. In contrast, freshwater species were most common in the northern parts and some low-saline coastal subregions.
In this study, we have collated and standardized sample data from all countries around the Baltic Sea to determine ecologically significant macrofaunal groups shaped by the pronounced environmental heterogeneity unique to the Baltic region. This inventory is to our knowledge the most extensive one to date from the Baltic Sea area. Our overall aim was to delineate major communities (sensu Petersen) inhabiting the Baltic Sea and to provide their distribution within the entire Baltic Sea area.
The study provides basic information on macrofauna distribution in the Baltic Sea area that can be used in management such as marine spatial planning and fishery management.

Joint dataset
The standardized dataset comprised over 1000 taxa from over 7000 locations (17 000 visit events) mostly sampled in period 2000-2013 for species abundance and partly for biomass ( Figure 1). Overall, 1176 taxonomic units (accepted in the World register of Marine Species WoRMS) were included in the final dataset. Over 70% of the samples were collected from May to July and therefore, the dataset is biased towards spring and early summer.
GIS databases were utilized for collating the distribution of benthic macrofauna covering the entire Baltic Sea basin. Most data (77%) were collected with a van Veen grab (covering a bottom area of 0.1 m 2 ), 1 -3 replicates per sampling were taken, sieved on a 1 mm mesh, preserved in 4% buffered formaldehyde-seawater solution, and identified in the laboratory to the lowest taxonomic level possible. Table 1 provides more details including specifications and the area covered by each laboratory. All species abundance and biomass data were recalculated to the area of 1 m 2 . Biomass records were available for 75% of the samples. Wet weight (including calcareous structures, without tubes) were only available for 2943 (39%) locations and ash-free dry weight for 2241 (29.7%) locations. Only wet weight were used for further analysis. Where only dry or ash-free dry weight data were available, in-house (IOW) biomass data conversion factors were used to calculate wet weight. The conversion factors are available upon request.
For the analyses, we used a 5 × 5 km grid (matching one-quarter of the European Environment Agency EEA 10 km reference grid size). Inner coastal fjords, estuaries, and lagoon regions were omitted from the analysis due to higher macrofauna variability, and because these areas contained a large part of freshwater organisms not representative for the broadscale picture of the entire Baltic. Average taxonomic unit abundances were calculated to account for all samples within one grid cell. Biomass records were obtained for 2268 of 5 × 5 km grid cells. Such aggregation of data neglects fine-scale spatial differences in the community structure and ignores temporal variation within the study period in the favour of capturing the most important large-scale patterns.

Community analysis
Cluster analysis (group-average linking) and nMDS (not presented here for brevity) were conducted to reveal similarities between stations. For multivariate analyses, we used taxa determined to the species level, where possible. Difficult taxa such as oligochaetes and chironomids were not identified to lower taxonomic level. Some characteristic Baltic Sea taxa were grouped to genus or even higher taxonomic level (Diastylis, Harmothoe, Exogone, Hydrobiidae, Idotea, Nephtys, Ophiura, Phoronis, Polydora, Diptera) to overcome taxonomic inconsistencies. For abundance-based analysis, only taxa that occurred at more than ten sampling events and contributed to the top 90% of the cumulative abundance per sampling event were retained. This reduction was done for better detection of the underlying community similarities and resulted in 272 taxa. For biomass analysis, taxonomic units were selected based on two criteria: either among top five species with highest biomass recorded at any sampling event or contributed to 80% of the cumulative total biomass. It resulted in the selection of 381 taxa. Abundance and biomass data matrices can be found in the Supplementary material.
All abundance data were fourth-root transformed, and biomass datawere log(x + 1)-transformed to account for the biasing effect of few high and low data values. Zero-adjusted Bray -Curtis similarities were calculated between stations. Clarke et al. (2006) suggested solving the problem of denuded samples in data (i.e. no common species among two sites) by using a dummy variable being equal to 1 for all objects allowing for the coefficient to be defined in these cases, thus avoiding bias. We used the dissimilarity matrix to perform a The Baltic Sea scale inventory of benthic faunal communities hierarchical cluster analysis (group-average). Community clusters were determined by selecting a distance where stations were grouped in well-defined clusters (arbitrary based on expert judgement). The classification results were tested with a one-way PERMANOVA based on the Bray-Curtis similarity matrix using 999 random permutations (Anderson, 2001). See Anderson and Walsh (2013) for the warning on usage of this test for unbalanced designs where heterogeneity of dispersions is known to occur. These analyses were conducted in PRIMER v6 (Clarke and Warwick, 2001).
Finally, we applied Similarity percentage analysis (SIMPER, Clarke and Warwick, 2001) and Indicator value analysis (IndVal  proposed by Dufrêne and Legendre, 1997) on the abundance and biomass data matrix to identify the main taxa which were responsible for differences in community structure. The IndVal method is implemented in R package "labdsv" (Roberts, 2015). IndVal is a measure of association between a taxon and a cluster of stations, calculated as the product of specificity (mean abundance or biomass of a given taxon within a cluster compared with the other clusters) and fidelity (taxon occurrence at stations belonging to a cluster). The maximum IndVal of 100% is reached when a given taxon is observed at all stations of only one cluster. Indicator species in contrast to species associations are indicative of a particular cluster of sites, i.e. good indicator species should be at the same time ecologically restricted to a target site group and frequent within it (De Cáceres et al., 2012). Statistical significances of indicator taxa were tested by 9999 random permutations of the samples and an a priory defined 0.05 threshold level was used.

Random Forests for spatial interpolation
Recently, a number of different modelling techniques have become increasingly used for benthic ecological studies (Wei et al., 2010;Collin et al., 2011;Reiss et al., 2011;Šiaulys and Bučas, 2012;Bučas et al., 2013). This study used the Random Forest algorithm developed by Breiman (2001) to generate full-coverage community maps. This method is implemented in the R package "randomForest" (Liaw and Wiener, 2002), a machine learning algorithm comprising an ensemble of randomly constructed decision trees (500 classification trees predicting class membership). A brief description of how the algorithm works can be found in Diesing et al. (2014). The model runs efficiently on large datasets, and can handle thousands of input variables without variable deletion. Moreover, the algorithm avoids the problem of overfitting. Another advantage of the Random Forest algorithm is that there is no need for additional cross validation or a separate test set to reach unbiased estimate of the test set error. This is internally estimated by the OOB (out-of-bag) error estimates, using a different bootstrap sample from the original data (one-third), during the construction of the tree. OOB data are also used to get estimates of variable importance. After building each tree, proximities are computed for each pair of cases. If two cases occupy the same terminal node, their proximity is increased by one. At the end of the run, proximities are normalized by the number of trees. Proximities are used to replace missing data, locate outliers, and to create low-dimensional views of the data. For this case study, it was essential that the algorithm balanced the error in class population unbalanced datasets. Due to the differing data sources used in this study, communities towards the entrance of the Baltic Sea are represented by considerably smaller number of sampling events and grid cells. Thus, no random stratified sampling design is possible to be achieved without a loss of spatial resolution. Elith et al. (2010) showed that results of such machine learning algorithms should be treated with caution if extrapolated outside of the training area. However, we assume that the spatial cover of our data is sufficient and that we are not going beyond interpolation.
To assess the accuracy of the spatial prediction, we derived measures of classification accuracy from an error matrix. These measures were overall OOB estimate, error rate, purity, and representation. Additionally, we applied Cohen's (1968) k coefficient of agreement calculated using "psych" R library (Revelle, 2015). This statistic measures inter-rater agreement typically used for categorical data to compensate for the chance agreement and is thought to be a more robust measure than simple per cent agreement calculation in this kind of study (Diesing et al., 2014).

Environmental variables
To support the spatial modelling, we used environmental variables, which represent direct or indirect linkage to the distribution of benthic macrofauna. We limited ourselves to datasets that were available for the entire Baltic Sea. We identified five major environmental predictors: bathymetric maps, near-bottom hydrographic fields (temperature, salinity, and flow speed), sediment type maps, and particulate organic carbon content of surface sediments.
We tested bathymetry datasets originating from the EU-BALANCE project (Al-Hamdani and Reker, 2007) and from Seifert et al. (2001). The latter showed to be more useful for our purposes, and was used to calculate the slope and indices of northness and eastness, describing the orientation of the slopes (based on first derivate of the bathymetry).
Hydrographic near-bottom fields of temperature, salinity, and flow speed were extracted from the model simulations of Gräwe et al. (2015), covering the entire North Sea and Baltic Sea for the period 1997-2015. The numerical model has a spatial resolution of 1 nm ( 1.8 km), data were provided as either monthly or annual means, minimum or maximum values. Additionally, we used monthly or annual standard deviation to quantify the natural variability. To account for extremes, monthly 5 and 95% percentiles were used. From the original dataset, only the period from 2000 to 2014 was used. The averaging of the hydrographical variables helped to reduce the variance in the resulting fields.
The only available data on seabed sediments covering the whole Baltic Sea region were produced by the EU-BALANCE project. The dataset was categorized in five categorical sediment classes: 1, bedrock; 2, complex sediments; 3, sand; 4, hard clay; 5, mud and clay. As these substrate data are available at a 200 × 200 m resolution, for model training, we down-sampled the data to the final 5 × 5 km grid. This was done by choosing the sediment class with highest coverage within the 5 km grid. Note that this strategy can lead to a mismatch between the dominating sediment class (within a 5 km grid cell) and the actual sediment type sampled for macrofauna. However, we believe that this error caused by the local heterogeneity occurs only at a small number of occasions and the entire Baltic picture aimed at should not be essentially biased.
Full coverage map of particulate organic carbon content in surface sediments with a resolution of 5 km was published by Leipe et al. (2011). All other variables were resampled before the prediction to match the sediment type data resolution to avoid information loss.

Estimate of total biomass distribution
To estimate the distribution of total benthos wet weight biomass over the Baltic Sea basins based on the available data, we utilized ordinary kriging on the log-transformed data averaged per 5 × 5 km grid cell. The resulting raster was then back-transformed to display biomass in gramme per square metre.

Communities based on abundance data
Based on the hierarchal cluster analysis of abundance data, we identified ten major communities inhabiting the Baltic Sea ( Figure 2). We placed a cut off at 23% of Bray -Curtis similarity, clusters with a minimum number of ten cells considered as major The Baltic Sea scale inventory of benthic faunal communities communities. It should be noted that we have chosen an arbitrary level of classification to provide an entire Baltic picture. We have tested various arbitrary cut-off similarity levels and decided on the reported one based on expert judgement. The higher the chosen similarity level, the larger the number of delineated groups (communities). Our aim was to determine a reasonable number of communities that can be well visualized and distinguished. Community structure differed significantly between the delineated communities (PERMANOVA F 10,2626 ¼ 175.32, p ¼ 0.001). The results of the indicator species and SIMPER analyses revealed that certain species contributed to observed pattern and helped to determine group distinctions. The characteristics of major communities and characteristic species are listed in Table 2 and the observed distribution aggregated to the 5 km grid cell scale is shown in Figure 3. Grid cells classified outside the major groups are treated as a separate class of "diverse/patchy/variable" community and will be referred to as "other" throughout the text. Distribution of the oxygen minimum zone was used to mask out regions with frequent absence of macrofauna (grey areas in Figure 3). A full-coverage map of the model prediction is presented in Figure 4.
Community 1 was the dominating type in the northern area. Characteristic species in community 1 were Monoporeia affinis, Marenzelleria spp., and Macoma balthica. Their abundance are well correlated with the occurrence of muddy to sandy sediments. There are some interspersed patches of community 2 found in sandy areas characterized by Hydrobiidae, Pygospio elegans, and Cerastoderma glaucum species. This community was mainly found in sandy habitats off Estonian west coast and in the southwestern Baltic Sea. Figure 3 shows group 4 in the Gulf of Finland and central Baltic down to Arkona Basin. Group 4 is strongly linked to the oxygen conditions in the deeper basins. Beggiatoa-periphytons occur frequently in these areas during periods of hypoxia or low oxygen content. After increase in oxygen levels, the motile Bylgides sarsi is the one to arrive first.

Communities based on biomass data
The hierarchal cluster analysis of biomass data identified 17 major communities in the Baltic Sea. Results are shown in Figure 5 (cut-off at 28%, clusters with minimum number of ten cells were considered as major communities, see Table 3 and Figure 6 for characteristic species). The significance of differences in species composition of 17 distinct communities was confirmed by PERMANOVA (F 17,2250 ¼ 91.255, p ¼ 0.001). Figure 7 shows full-coverage map resulting from the model prediction.
Biomass-based community 1 covers most of the northern areas. It appears on both muddy and sandy sediments and has Saduria entomon, Ma. balthica, M. affinis, H. spinulosus, Pontoporeia femorata as characteristic species. Community 2 is characterized by Hediste diversicolor, Mya arenaria, Hydrobiidae, and C. glaucum and can be classified as the third largest community, based on predicted spatial coverage. Community 2 mostly appears on sandy sediments and can be found in patches in the Archipelago Sea, in shallower parts of the Gulf of Riga, and on the edge of the Eastern Gotland Basin. However, the preferred habitats occur in the Pomeranian Bay, the Rügen-Falster-Plate, and in the northwestern part of the Sound. The second largest coverage was community 5,  which is characterized by B. sarsi and corresponds to abundancebased community 4 (i.e. macrobenthic community attributed to the area when conditions are oxic).
The accuracy estimates for the abundance-based classes are summarized in Table 4. The overall accuracy achieved by the spatial model prediction based on abundance data derived communities was 93%. However, attempts failed to predict community 10 as well as community assemblages named "other", where the representation was zero for both of these two classes. A slightly better, but still low, representation is achieved for communities 6 and 9 (see Table 2 and Figures 3 and 4 for respective details on characteristic species, observed and predicted distribution).  Table 2 for community characteristics, and Table 4 for accuracy estimates. This figure is available in black and white in print and in colour at ICES Journal of Marine Science online.
The Baltic Sea scale inventory of benthic faunal communities

1203
The accuracy achieved for the model based on biomass data was 92%. Table 5 shows the confusion matrix for the Random Forest biomass run-based classes (classes 1 -17 are recognized as communities 1 -17). Community 18 included all "other" pooled variable communities. Based on our available predictors, the model predicted community assemblages 1 -3, 5, 9, 11, 13-16 quite well (see Table 3 and Figures 6 and 7 for respective details on characteristic species, observed and predicted distribution).

Estimated distribution of total wet weight biomass
Interpolation of total wet weight biomass over the Baltic Sea basins derived using ordinary kriging are shown in Figure 8. Areas where biomass data were lacking produced interpolation artefacts: for example, the very low values at the shallow parts of the Eastern Gotland Basin at the west coast off Latvia.

Community analyses
The present study describes the major benthic communities inhabiting the Baltic Sea in terms of both abundance and biomass.
Community structure is often used to assess changes along environmental gradients. Abundance is often used as the response variable in community data analyses. Biomass, however, is generally used to indicate the functionality of a species within a community, as it is strongly correlated with metabolism (Saint-Germain et al., 2007). Biomass is also a decisive metric in recently developed biotope classifications (such as HELCOM HUB Underwater Biotope and Habitat classification system, Schiele et al., 2015), and is especially relevant for the implementation of Marine Strategy Framework Directive (MSFD) and other legislative directives.
The question of using either abundance (count) or biomass as a response variable always emerges in community analyses. Generally, this dilemma is an understudied question. Which biodiversity "currency" is more relevant for different aspects of ecosystem functioning, and what is the relative importance of the two in relation to the effect they have in ecosystem processes (see Norkko et al., 2013)? Communities themselves are not abundances or biomasses, but both, and a statistical workflow should be developed in the future to combine these two approaches.
The whole northern area of the Baltic Sea is dominated by abundance-based community 1, characterized by M. affinis, Marenzelleria spp., and Ma. balthica. This is expected as all the three species are typical for the Baltic Sea according to previous studies (Laine, 2003;Perus and Bonsdorff, 2004). Monoporeia affinis is a limnic species, with highest densities observed for salinities between 5 and 9, and only in rare occasions, it occur at stations with salinity values above 10 (e.g. . Marenzelleria spp. are among the most successful non-native benthic species in the Baltic Sea (Maximov et al., 2015). After first appearance in 1985, they quickly colonized the entire sea, now occupying a dominant position in the zoobenthos. Although the distribution of Ma. balthica is limited by salinity of 3 (Väinölä and Varvio, 1989), community 1, with Ma. balthica as a typical constituent species, reaches the whole Bothnian Bay and eastern part of Gulf of Finland. Some sources (e.g. Elmgren, 1978) imply that in contrast to most parts of the Baltic Sea where Ma. balthica is most common, bivalves are absent and meiofauna plays the most important role in the Bothnian Bay and the easternmost Gulf of Finland. Since we focused on soft-bottom macrofauna communities, the similarity analysis suggests that these areas still belong to that most typical community 1 cluster. Thus, the similarity of community structure despite the absence of bivalve species is still overall very high. However, if a finer level of classification was considered, these areas would indeed to be allocated into a separate class. Laine (2003) analysed large-scale distribution of soft-bottom macrofauna in the open Baltic Sea based on biomass data collected in 1996/1997, thereby representing the momentary state in distribution for that period. He showed that zoobenthos in the Bothnian Bay and central part of the Bothnian Sea is dominated by the isopod Saduria entomon and the amphipods Monoporeia/Pontoporeia. This is confirmed by our data, with Ma. balthica as another characteristic species of community 1 (both abundance and biomass based) increasing in importance where salinity is higher (above 4.5).
Standardized taxonomic units in abundance/biomass matrices are essential to ensure ecologically meaningful interpretation of clustering of stations. Unresolved taxonomic units are a problem that was      Figure 6 for colour legend, Table 3 for community characteristics, and Table 5 for accuracy estimates. This figure is available in black and white in print and in colour at ICES Journal of Marine Science online.
The Baltic Sea scale inventory of benthic faunal communities 1207 identified to family and to order level could be documented for the same station. As a result of such split, the Diptera as taxonomic unit was underrepresented in the relative abundance and was not picked as characteristic by the indicator species analysis for abundance-based communities. However, Diptera (and particularly the family Chironomidae) are relevant as characteristic species in communities 1 and 2 (where they have especially high subdominance in the shallow northern regions along the coast). The treatment of Diptera as one uniform taxonomic unit would result in the shift of branching between the stations of these two communities.
It is important to highlight that our community analysis results are not directly comparable with HELCOM HUB classification of biotopes. The HELCOM HUB developed particularly for the Baltic Sea is a hierarchical classification structured into six levels, whereby the first three habitat levels are defined by abiotic factors such as the presence of photic zone and substrate type. These habitat levels then constrain levels 4 -6 that describe biotope. The finest biotope level 6 accounts only for biomass of dominating taxa, predominantly bivalves (HELCOM, 2013b). In the southwestern German Baltic EEZ, Schiele et al. (2015) show that molluscs dominate the biomass. Our data show that the most northeastern regions, where salinities are very low, are dominated by Amphipoda and Isopoda in Bothnian Bay, and Polychaeta (Marenzelleria spp.), Oligochaeta, and Chironomidae in the Gulf of Finland. In contrast to 328 biotopes defined by HUB, our communities are delineated based solely and entirely on biological data, taking into account the abundance or biomass of all the species present. We consider changing community composition as a continuum without necessarily ascribing a biological boarder to change of sediment type or photic zone on a map.

The dataset
To provide a solid statistical estimation, we included data with different sampling and laboratory methods, or stations not necessarily fully representative for the respective subareas. As already pointed out by Carman and Cederwall (2001), this approach can sometimes lead to under-or overestimates of total biomass for different subregions. Differences in gear and mesh size within the joint dataset (Table 1) can cause concerns about the validity of the classification. For example, no standard 1 mm sieved data are available for the Russian part of the Gulf of Finland and for Estonian waters. Instead of excluding these regions, we chose to include the data based on 0.25 or 0.4 mm sieves. The risk of misleading estimate due to this methodological difference while comparing between sites was reduced by (i) considering only macrobenthic species, and (ii) the use of data transformation before the community analysis to reduce variance inflation. The total biomass values depend mainly on a few large species. The findings of Elmgren (1978) supported the point that within regions of relatively high macrofauna biomass, meiofauna are likely to have less importance for the total biomass.
Densities of individual species vary greatly seasonally but also between years. Due to these fluctuations, only rough estimates (with significant errors bars) can be made for the benthos biomass of the whole Baltic Sea and its individual subregions. For some regions and depth strata, the total macrofauna wet weight estimated in this study is in good agreement with Carman and Cederwall (2001), e.g. for the zone 70 -120 m in Bothnian Bay and the Quark region. Our estimate for the 30 -70 m strata of the same region is at least one order of magnitude higher than reported by Carman and Cederwall. Similar numbers hold for the Baltic Proper. Additionally, total biomass calculated for the deepest part of the region calculated in this study was two orders of magnitude larger (Table 6).

Spatial modelling and mapping
Many recent case studies that report spatial mapping of benthic fauna were focused either on some distinct regions (Glockzin and Zettler, 2008; or relating the distribution of biodiversity or single species to three physical predictors. These are sediment properties and two main environmental variables for brackish-water systems, salinity and oxygen supply (Ojaveer et al., 2010;Zettler et al., 2014). The presented analysis was using averages of all the available sampling events for faunal abundance and biomass over a 5 × 5 km grid which includes the integration of the temporal variation over the available data ( Figure 1). The use of large grid cells also provides a smoothing effect on certain scales of the predicted major benthic macrofaunal communities over the Baltic Sea. This can be seen as a trade-off between capturing the most important large-scale patterns in their distribution while omitting the fine-scale structures phenomena. Table 4. Confusion matrix for the Random Forest of abundance-based classes (classes 1 -10 are communites 1 -10, class 11-"other" communities, class 12-cells with only no macrofauna records, respectively). Note Random Forest fails to predict classes 6, 9, 10, and 11 based on available predictors. Model note: 2703cellsAvAbu_7_SpatJoin_for_Sed_kl_check: factor(community) iowtopo_bs + tempulf + speedUlf + saltmean_L + factor(factor(sedkor), levels ¼ c("1", "2", "3", "4", "5")) + saltstd3. 1208 M. Gogina et al. To generate full coverage prediction maps based on classified point data, we have used the Random Forest method, chosen due to its best predictive performance achieved in most of the comparative studies (Collin et al., 2011;Reiss et al., 2011;Šiaulys and Bučas, 2012). Despite an overall good performance, this method has limitations. For example, Marenzelleria community 4 Figure 8. Distribution of interpolated total wet weight biomass, derived using ordinary kriging interpolation of available biomass data averaged per 5 × 5 km grid cell. Transparent light grey and dark grey areas mask out the deep water hypoxic and anoxic oxygen conditions. Note that at the areas where biomass data are lacking interpolation artefacts are evident, for instance, values at the shallow parts of the Eastern Gotland Basin at the west coast off Latvia are presumably too low. This figure is available in black and white in print and in colour at ICES Journal of Marine Science online.
The Baltic Sea scale inventory of benthic faunal communities 1209 in the biomass-based classification is not well captured by the Random Forest model (see error rate in Table 5). Important predictors that drive and explain the distribution patterns of this community, such as oxygen concentrations, are not included in this study.

1210
M. Gogina et al. misclassifications are mainly associated with uncommon (rarely sampled) classes. Apart from an unbalanced design and lack of available well-suited predictors, the low model skill for such class as "other" in our study is caused by very high natural variability. Rather than classifying models as correct or incorrect, they should be viewed as useful starting points for different applications (Reiss et al., 2011). The presented community maps are general overviews and many local details including temporal variability of benthic fauna are masked out in our representation. Due to a coarse approach however, these maps must be used with caution.
The relevance of the presented maps and their temporal validity is also limited by the regional variability of benthic communities. In some regions, the community structure is stable over time, whereas in other areas, higher fluctuations or even regime shifts due to species invasions are recorded [e.g. Gogina et al. (2014) in the southwestern Baltic Sea; Maximov (2011), Maximov et al. (2014 for the eastern Gulf of Finland; Kauppi et al. (2015) for the major part of the Baltic Sea area]. The variable environment of the Baltic Sea (especially oxygen and temperature conditions) is another reason of temporal changes in benthic communities (Laine, 2003;Rousi et al., 2013).

Comparison with historical data
In his work from the beginning of the last century, Petersen (1913) mapped the benthic macrofauna communities of the western Baltic Sea and Swedish west coast, delineating a Brissopsis lyrifera community in the Kattegat and an Echinocardium cordatum community along the Danish coast southwest of Laesø. Macoma communities are mostly reported further south to the Great Belt and the Sound down to Rügen. In our study, these areas match our biomassbased community 14 (B. lyrifera, Amphiura spp., Thracia convexa, Scoletoma fragilis, Hyala vitrea, Lipobranchius jeffreysii), community 11 (Phoronis spp., Lanice conchilega, A. islandica, Te. fabula, Ec. cordatum, Nucula sulcata, Ma. calcarea) and community 2 (He. diversicolor, My. arenaria, Hydrobiidae, C. glaucum), correspondingly. As for abundance-based communities, these were community 5 (Amphiura spp., Ab. nitida, G. oculata, E. tenuis, T. flexuosa, N. nitidosa, D. glaucus), community 7 (Phoronis spp., T. fabula, Th. phaseolina, O. borealis, S. bombyx, B. lanceolatum, Sp. arndti), and community 2 (Hydrobiidae, P. elegans, C. glaucum), respectively. In Petersen (1918), an Ab. alba community occurs around the Island of Funen down to deepest parts of the Femarnbelt and the Bay of Mecklenburg. This roughly overlapped with our biomassbased community 3 (Diastylis, Astarte, My. truncata, Dendrodoa grossularia, Ab. alba, L. koreni, Trochochaeta multisetosa, Lineus ruber) and community 6 (Capitella spp.), possibly indicating the latter to be a degraded variation of the former one. Naturally, communities' distribution largely deviates, and it is important to outline that this not only mirrors distributional changes over time, or depiction of different characteristic species. This highlights the large transitional nature of areas that cannot be captured by statistical methods employed in this study.
Comparing the penetration of some marine and brackish-water animals into the Baltic Sea reported in Zenkevitch (1963), our study shows that many species overlap (e.g. Gammarus spp.). Others indicate different distribution limits according to the recent data. For example, distribution limits for Mya penetrate more to the north, whereas Priapulus does not occur beyond the 55.58N. Zenkevitch reports only 25 known polycheate species in the southern part of the Baltic Sea. In contrast, our data list over 90 polycheate species, which are found in the Bay of Mecklenburg alone. Moreover, by comparing our aggregated dataset with the data published by Jarvekulg (1979), we show the more northerly penetration of the Gastropoda Alderia modesta in the Bothnian Sea. Bivalvia Ma. calcarea has its easternmost record on the edge of Arkona Basin, i.e. absent in the inner parts, according to our data. These inconsistencies might be caused by shifts in the species distribution during the last 30 years. Moreover, our larger set of collected field data helped to detect species, which were overlooked in former times. Finally, the differences in taxonomic expertise can be responsible for the deviations.

Conclusions and outlook
The present standardized dataset represents an important baseline for the present state of the benthic macrofauna in the Baltic Sea. It can therefore be used for multiple analyses of benthic fauna distribution and its features and has been a unique opportunity to collate all benthic macrofauna data from the Baltic Sea region to finally create Baltic Sea benthic communities maps, targeted at the community level.
The results strive towards a Baltic wide picture, and of course when going more in to details, more groups and species would appear important. The work will continue with further refinement of presented results and description of finer community units in selected regions.
A different approach, in contrast to the presented study for spatially explicit mapping, is to model key species distributions separately, similar to Schiele et al. (2015), where communities can be derived by overlaying the expected averaged abundance and biomass and estimating the expected dominance. Moreover, smaller grid cell sizes might be considered (1 km, 1 nm, 2 km as in HELCOM), as well as 10 km (EEA reference grid) or even an adaptive grid. An estimation of the spatial variability within a grid cell should be addressed in future studies and should include temporal changes in local community structure. This study considered abundance and biomass-based communities separately, although incorporation of these two metrics into a single meaningful metric remains an open question.
This paper provides a baseline for future studies for better recognition of links between benthic fauna and ecosystem functioning. As an outlook based on the derived maps, the spatial distribution of specific macrozoobenthos functions (e.g. biodeposition and bioturbation) can be estimated. This will help to better understand the possible effects of natural or human-induced changes in macrofauna communities in the hosting ecosystem.

Supplementary data
Supplementary material is available at the ICESJMS online version of the manuscript. Additionally, the data outputs are made accessible through http://gis.ices.dk/sfdev/ and http://geo.ices.dk/ geonetwork/srv/en/ under "Benthic faunal communities in the Baltic Sea". acknowledged by UG. The authors especially thank all the colleagues involved in the fieldworks and laboratory analysis. We are grateful to the editor of the ICESJMS Michel J. Kaiser and to anonymous reviewers for the helpful comments and thorough editing work to improve the clarity of the text, and to Anastasia Semenova for correcting the English.