Suitability of open digital species records for assessing biodiversity patterns in cities: a case study using avian records

Openly available species observation records on various online platforms achieve good coverage in urban areas. Thus, such digital data could provide a basis for biodiversity assessments in cities. Here, we investigated suitability of open digital species occurrence data, compared with systematically ﬁeld surveyed data, in Freiburg, Germany (a Western city) and Dhaka, Bangladesh (a global-South city). We focused on resident bird species richness as an indicator of local biodiversity. We collected avian records for urban areas from ‘ornitho.de’ in Freiburg and ‘gbif.org’ in Dhaka. Additionally, we conducted point count surveys at several urban locations in both cities. Using these records, we prepared three grid (cell size 250m (cid:2) 250m) based datasets—open digital dataset (i.e. records compiled from well-surveyed grid cells), ﬁeld surveyed dataset (i.e. records of systematic bird surveys) and combined dataset (i.e. digital data and ﬁeld data combined). We compared the relationship of resident bird richness with different habitat factors by applying linear regression models, separately using each of the three datasets. We assessed suitability of data from online platforms by comparing the variables retained after model selection based on digital data versus ﬁeld surveyed data. We found that ﬁeld surveyed data and combined data did not alter general understanding of the key driving factors of bird richness patterns we obtained from open digital data. This held for both case examples, Freiburg vs. Dhaka, respectively. This suggests that open digital data from well-surveyed urban locations can provide a suitable basis to assess drivers of biodiversity patterns within cities.


Introduction
Consistent data on occurrence and distribution of species underpin biodiversity conservation planning. A lack of such data leads to information gaps and affects the understanding of spatial and temporal biodiversity patterns (Soria-Auza and Kessler 2008; Amano et al. 2016). The enormous amounts of species observation records available from a variety of online platforms can be useful to minimize such gaps. Since the coverage of species occurrence reporting at online platforms is generally higher in urban areas than in rural sites (Tye et al. 2017;Speed et al. 2018;Petersen et al. 2021), readily available digital species records hold a huge potential to advance biodiversity research and conservation in cities. In fact, open digital data are increasingly advanced in urban settings, particularly to assess and monitor population trends (e.g. of birds and butterflies), to model species distribution, and to measure species specific responses to urbanization (Wang Wei et al. 2016;Callaghan et al. 2020). Nevertheless, digital species observation records commonly remain unused due to potential biases (Sousa-Baena et al. 2014), and representation of such data in urban ecological research is below expectation (Wang Wei et al. 2016). Besides, as indicated by a lack of data and reports in developed vs developing regions (Amano et al. 2016;Chandler et al. 2017), such data are not readily used in assessing, monitoring and planning for urban biodiversity in cities across the world.
The concept of citizen involvement in species data collection dates back to the 19th century, when it was initiated by the American Ornithologists Union (Droege 2007), and since has been widely adopted by bird watchers around the globe. In recent times, this approach has evolved with the emergence of web-based platforms, where citizens collect and share their species observation data quickly and easily. Globally, a vast amount of species records is now available on various online platforms (Sullivan et al. 2014;Amano et al. 2016). One of the largest global platforms is 'Global biodiversity information facility (GBIF)' which delivers more than one billion occurrence records of species from a number of taxonomic groups (www. gbif.org). Another notable platform is 'eBird' which offers more than 30 million checklists of avian occurrence records worldwide (www.ebird.org) and is one of the largest contributors of birds in GBIF. Many other platforms are also available which are devoted to the accumulation of species records for specific regions (e.g. www.ornitho.de). These records are collected by local citizens and amateurs largely outside of systematic surveys on bird communities. The observations are acquired mostly in a random way during visits to birdwatching sites, in public parks, near observers' residences or in private gardens; they are only occasionally collected for specific projects of mapping scheme. Due to the non-standardized way in which these data are collected, these species records remain of doubtful comparability and thus of limited use for biodiversity studies and planning. Incomplete inventories, sampling variability and gaps in spatial coverage across distinct geographic areas have been flagged as major barriers in the use of species data from such online platforms in research and conservation (Hortal et al. 2007(Hortal et al. , 2015Peterson et al. 2015;Freeman and Peterson 2019).
Given these concerns, few researchers have assessed the completeness in digitally acquired species data, spatial and temporal distribution of such data, and any gaps thereof (Soberó n et al. 2007;Sousa-Baena et al. 2014;Amano et al. 2016;Troia and McManamay 2016;Freeman and Peterson 2019;Petersen et al. 2021). Aggregating species occurrence records by grid cells is suggested as a useful technique for studies using online species records from different sources (Troia and McManamay 2016). It is further recommended to measure inventory completeness (IC) within digitally available data to understand spatial bias and to identify well-surveyed sites (Colwell and Coddington 1994;Soberó n et al. 2007;Sousa-Baena et al. 2014;Troia and McManamay 2016). A 'well-surveyed' site refers to an area with high IC representing sufficient survey effort, where an additional observation/survey is likely unnecessary (Sousa-Baena et al. 2014). Presumably, species records at such sites are complete enough to identify local differences in species richness, and to assess explanatory factors for such patterns.
In this article, we compared species observation records from well-surveyed sites acquired from open digital sources, with species records collected using standard field survey protocols (Bibby et al. 2000;Gregory et al. 2004), for biodiversity pattern assessments in two cities-Dhaka in Bangladesh and Freiburg in Germany. We assessed the species richness of birds (i.e. as an indicator of biodiversity), since avian records are the most prominent in comparison to all other taxa within the currently available digital databases (Chandler et al. 2017;Freeman and Peterson 2019). Moreover, birds are one of the most studied taxon in urban ecology (Strohbach et al. 2009;Trimble and van Aarde 2012;Li et al. 2019). Our study focused on urban areas, since digitally available species observation records are more frequent and pronounced there than in rural sites; thus, online portals provide better spatial coverage in settled areas (Tye et al. 2017;Speed et al. 2018). Further, we chose cities on two continents, because sites in developed countries (e.g. in North America, Europe, Australia) tend to be better represented in online portals than sites in developing countries of the Global South (Amano et al. 2016;Chandler et al. 2017).
Our aim was to investigate whether avian species occurrence records from well-surveyed urban sites available on online portals can be suitable for assessing patterns of bird species richness in relation to habitat factors in urban areas, as commonly used in the context of monitoring and planning of urban biodiversity. We infer that digital data are suitable, if the predictors of species richness patterns derived from it are consistent with those from data collected using standard field methods. Specifically, we expected to find bird species richness to decline in relation to increasing imperviousness (Silva et al. 2015;Sultana et al. 2021), whereas high bird species richness associates with high proportions of green structures (Evans et al. 2009;Mayorga et al. 2020).

Study area
We conducted our investigation in two cities, Freiburg and Dhaka, to cover two alternate scenarios in availability of open digital data. For Freiburg, as is typical for Western world cities, plenty of digital species occurrence records are available from online platforms, whereas we considered Dhaka as a model of cities located in the global South where digital data collection retains a deficiency. For our intended study, systematic field surveyed data were easily accessible in both cities.
Freiburg is a medium-sized Western European city with a population of $230 000, located in southern Germany (Statistisches Landesamt Baden-Wü rttemberg 2018). Within the administrative limits (153 km 2 ), this city boasts 47.1% green space of which $32% is forest cover (Beatley 2000;Medearis and Daseking 2012). It is recognized as the model of green and sustainable cities globally (Buehler et al. 2011) and is a popular site for bird watching for local ornithologists.
In contrast, Dhaka is a megacity located in Southern Asia with a population of $14 million (www.citypopulation.de) and it is the capital of Bangladesh. We focused on the central city region of greater Dhaka which has a surface area of 306 km 2 , of which $117 km 2 are built-up areas (Khaleda et al. 2017). A large extent of farmland exists within the boundary of the city area (about 64 km 2 ) and its surroundings (around 193 km 2 at 5 km buffered area) (GISAT 2011); farmland extent has decreased due to increasing urbanization in the last few decades Khaleda et al. 2017). Moreover, only 8% tree coverage remains within Dhaka, due to a rapid increase in built-up areas (Islam 2001;Byomkesh et al. 2012).

Compilation of bird records
For both Freiburg and Dhaka, we collected readily available bird occurrence records from digital platforms ('ornitho.de' in Freiburg; 'gbif.org' in Dhaka). Additionally, we surveyed birds in the field at pre-selected urban sites using standard protocols. We compiled three different datasets: (i) an open digital dataset comprising bird observation records from online platforms; (ii) a field dataset comprising bird observation records from systematic bird surveys; (iii) a combined dataset of bird records from both open digital sources and field surveys.
Following a grid-based data aggregation approach, we mapped all bird observation records from each dataset in ArcMap (version 10.5.1). We used the spatial join tool to allocate each bird record to one of the grid cells, at 250 m Â 250 m scale extent, across the urban area of the cities, and excluded any cells which were predominantly forest sites/agricultural areas (Figs 1 and 2).
Open digital dataset comprising bird observation records from online platforms For Freiburg, digital bird records from a 5-year period, 2012À6, were collected from the 'Ornitho.de' database (www.ornitho. de). These consisted of 48 471 observation records of birds in Freiburg city. Each record comprised of a species occurrence at a geographic location along with the number of individuals observed. Ornitho.de is considered the main digital platform for avifaunal data in Germany, whose legal entity is DDA (Dachverband Deutscher Avifaunisten). This database is associated with different partner organizations, working groups and specialist groups, facilitating volunteer bird observation at the local level. Dhaka lacks a regional database platform comparable to Ornitho.de. Here, we considered digital bird records from GBIF. GBIF (www.gbif.org) is the largest global online platform of species occurrences records for different taxa. It contains an enormous amount of avian records, most of which come from eBird (www.ebird.org), the largest online platforms for bird observations. We examined all available species occurrence records for Dhaka in GBIF (gbif.org 2019). Aves occurrence records were few, and annual numbers did not show significant trends in the frequency of online reporting. Hence, in order to increase our sample size, we extracted available bird occurrence records for an extended period of 1995À2017. In total, we found only 1062 bird records for Dhaka city; each record comprised of a species incidence at a geographic location.
We considered only human observation records; any museum specimen records were removed. We checked and removed records with geospatial issues (e.g. if recorded coordinate, geodetic datum or coordinate precision appeared invalid). Additionally, we cleaned the datasets to remove any potential species identification errors in consultation with local ornithologists. We considered only the records which contained species-level identifications of birds.
All bird records were joined with the grid cells of the cities. For our analysis, we used only the records from selected grid cells, which were well-surveyed sites at the open digital source level. For this, we measured IC at each grid cell that contained bird observation records. We followed the formula, IC ¼ S obs / S exp (Colwell and Coddington 1994;Sousa-Baena et al. 2014;Freeman and Peterson 2019). Here, S obs is the total number of observed species per grid cell; S exp is the expected true species richness, which we measured following Chao 2 estimates of species richness (Chao 1984;Colwell and Coddington, 1994), using the R package fossil (Vavrek 2011). In selection of well-surveyed grid cells, the used level of IC varied in the existing literature; on average, a value !0.50 was used to confirm well-surveyed areas (Sousa-Baena et al. 2014;Troia and McManamay 2016). Accordingly, our selection of the well-surveyed grid cells at the open digital dataset considered the threshold of IC !0.50 and frequency of species occurrence records !10 (Figs 1 and 2). This selection allowed us to remove sites with low sample size despite their high IC.
Field dataset comprising bird observation records from systematic bird surveys We surveyed birds in Freiburg and Dhaka, following commonly used techniques of fixed radius point counts (Bibby et al. 2000;Gregory et al. 2004). Bird counts have been used to investigate avian communities along urbanization gradients by many other urban ecologists (Blair 1996;Clergeau et al. 1998;Sandströ m et al. 2006;Menon et al. 2014; van Heezik and Seddon 2017). Since cities include many restricted areas, traditional sampling designs for rural/forest areas were unsuitable for the urban environment (DeGraaf et al. 1991; van Heezik and Seddon 2017). We thus limited our surveys to easily accessible public sites.
In Freiburg, we conducted a field survey at 45 point locations during MarchÀMay in 2018. The locations were distributed at nine urban plots (each with $1 km 2 areas) across different densities of built-up areas.
In Dhaka, we conducted an extensive field survey during FebruaryÀJune in 2018. Dissimilar to Freiburg, Dhaka is a big city, but the number of bird records available from open digital platform was low. Therefore, we extended the duration of the survey to allow us to inspect a larger number of urban point locations. We surveyed 172 point locations, which were distributed at 17 urban plots (each with $1 km 2 area) and along 10 line transects (1À8 km in length) across the city.
For our bird survey, we visited our preselected plots/line transect sites during four hours in the early morning and four hours before sunset, when birds' activities are generally high. Within each study site, we chose our first survey point randomly; subsequent points were chosen by walking between 100 and 200 m following any accessible walking paths. Thus, all of our field points were at least 100 m apart from each other. At each point, all bird species occurrences were recorded by field ornithologists following a point count method for 10À15 min and using a 50 m radius.

Combined dataset of bird records from open digital sources and systematic field surveys
The data from selected well-surveyed grid cells at the open digital source level and the data of our field surveyed grid cells were merged.
In both Freiburg and Dhaka, open digital datasets and field datasets varied in terms of the data collection period; thus, the proportions of observed migratory birds were higher on online platforms owing to records being collected across the seasons. To reduce bias and for better comparison between the datasets, we considered only records of bird species thought to be resident at the city scale. For this, we identified the origin and migration status of birds in both cities using species fact sheets and range maps of BirdLife International (BirdLife International and Handbook of the Birds of the World 2018 ; BirdLife International 2020). In the case of Freiburg, this might include some species which are partly resident: in the Black Redstart, for instance, a few individuals are year-round residents, but most are migrants. However, surmising the migratory status of individual birds is beyond the scope of current paper. We thus strictly followed BirdLife range maps to identify and extract the records of resident bird species. For our analysis, we measured resident bird species richness (i.e. number of species) per grid cell in each of three grid-based datasets.

Habitat factors
We extracted data for different predictor variables at the grid cells from available raster datasets and local urban atlas map datasets. To represent the urbanization effect on bird species richness, we looked at the percentage of impervious surface as a proxy for the extent of built-up areas (Brown de Colstoun et al. 2017). Besides, we considered the amount of high density urban fabric which refers to the total area (m 2 ) with >50% soil sealing level (GISAT 2011;Ares 2012;EEA 2017). To represent the green structures' effect, we looked at the vegetation fraction (%) (Broxton et al. 2014) and the total urban green area (m 2 ) per grid cells. Urban green area included public gardens, parks, zoos, suburban natural sites bordered by urban fabric, water bodies, and any green sites used for sports, leisure or recreational activities (GISAT 2011;Ares 2012;EEA 2017). In addition to this, we measured proximity (i.e. distance, m) from each grid cell's centroid to the nearest edge of forest area in Freiburg, and to the nearest edge of agricultural area in Dhaka (GISAT 2011;EEA 2017). Among all habitat factors (Supplementary table 1), we performed a 'Pearson correlation analysis' separately within each dataset. We estimated the variance inflation factor (VIF) to check for correlation effects (i.e. correlation coefficient >0.70 Suitability of open digital species records for urban ecology | 5 may induce multicollinearity); we considered VIF <5 to confirm that collinearity was not an issue (Akinwande et al. 2015).

Sampling factors
Along with the habitat-related predictor variables, we assumed that frequency of observation records (FR), observed days (FoD), observers and surveyed point locations (Supplementary table 1) might have affected resident bird records per grid cell. In case of open digital data, we presumed additional sampling effects due to varying levels of observer experience (Kelling et al. 2015;Johnston et al. 2018) (Supplementary Table 1). We compared Pearson correlation coefficient of resident bird species richness with all these sampling-related factors in each dataset separately.
The sampling effect among the datasets varied, possibly owing to the variation in study effort. Hereafter, we identified the topmost correlated sampling factor of bird species richness within each dataset separately in both cities. We considered the identified sampling factor as a covariate of bird species richness during model assessment using the corresponding dataset. More precisely, in Freiburg, FoD was the identified sampling factor in the open digital dataset and in the combined dataset; FR was the sampling factor in the field surveyed dataset. In Dhaka, FR was the identified sampling factor of bird species richness in all the datasets.

Model analysis
Our response variable was number of resident bird species (i.e. species richness) in the grid cells (250 m Â 250 m). To achieve normality, this number of resident bird species was log10-transformed in case of Dhaka.
For the datasets of Freiburg, we checked model fit (i.e. using a full model that combines all variables) by applying linear regression (lm) and a generalized linear regression model (glmfamily ¼ Poisson and negative binomial); the linear regression model showed a better fit with the lowest AIC. For the datasets of Dhaka, linear regression model fit was also appropriate. Hereafter, we applied linear regression models to explore and compare the best causal relationship of resident bird species richness with the urban habitat-related variables in both cities.
We built similar models, using the (i) open digital dataset, (ii) field surveyed dataset and (iii) combined dataset. We formed a global model that included all the variables of habitat factors; one model included only the variables related to urbanization effect; one model included only the variables related to green structures' effect. Additionally, five simpler models were created for each of the habitat factors separately. All eight of these models included the identified sampling effect factor as a covariate of bird species richness.
To assess the best causal relationship of bird species richness with habitat factors, we compared the coefficient parameters among the models. We compared the entire set of models and  (Peterson et al. 2018) and effects (Fox 2003).

Results
Open digital data revealed similar key predictor variables of urban bird species richness as resulted from field surveys. This held true not only for the Western city of Freiburg with its rich coverage in digital data platforms, but also for the poorly covered global Southern city of Dhaka.

Freiburg
In Freiburg, 630 urban grid cells accounted for 21 150 resident bird observation records obtained from open digital sources. Among these, 48 urban grid cells were selected as well-surveyed, and included observation records of 70 bird species. We covered 42 urban grid cells during field surveys and documented 32.8% (23 resident bird species) of the species recorded in open digital source. Combining all records, we found that the most frequently observed species were Turdus merula (Eurasian Blackbird), Passer domesticus (House Sparrow), Parus major (Great Tit), Fringilla coelebs (Common Chaffinch) and Parus caeruleus (Eurasian Blue Tit), a pattern which largely held true across all datasets (Fig. 3).

Model results
The model analysis of bird species richness with predictors, using all three datasets, showed that highest weight and lowest AICc was achieved by the model which included the variable of impervious surface (Table 1). The selected models pointed to the negative effect of proportion of impervious surface on bird species richness. The explained variation (adjusted R 2 ) in the selected models using open digital data was 45%, using field surveyed data it was 29% and using combined data it was 77%.
The averaged model coefficients using separate datasets similarly indicated impervious surface as a significant predictor of bird species richness ( Table 2). The negative effect of the variable 'proximity (i.e. distance) to forest area' was retained as significant for bird species richness only in the model average summary using field surveyed data.
The fitted value of bird species richness did not indicate any alteration in the linear relationship of observed richness to the predictor variable 'impervious surface' among the datasets (Fig. 4). Using the combined dataset, the plot of fitted values of bird species richness showed some grouping effect indicating different sets of data. However, the trends in the linear relationship of bird species richness to the predictor variables using separated datasets was similar. Outside of these, the effect of sampling factors was stronger in open digital data than in fieldcollected data; inclusion of topmost correlated sampling factor 'frequency of observation days (FoD)' as a covariate of bird species richness improved the model performance (i.e. for the selected model with FoD, AIC ¼ 345.2 and for the similar model without FoD, AIC ¼ 372.5). There was no issue of normality in the selected models. We examined any heteroskedasticity by adjusting variance covariance Heteroskedasticity Consistent (vcoHC) which showed no inconsistency in the standard error and significance level of the variables. Suitability of open digital species records for urban ecology | 7

Dhaka
In Dhaka, 45 urban grid cells (<1% of the total city grid cells) contained 947 resident bird observation records at the open digital data level. Among these, 12 grid cells were selected as wellsurveyed (Fig. 2), and included records of 67 bird species. Our field surveys covered 128 grid cells, where we documented 73% (49 species) of the species reported in digital data. Combining all observation records, we found that the most frequently observed species were Passer domesticus (House Sparrow), Corvus splendens (House Crow), Acridotheres tristis (Common Myna), Milvus migrans (Black Kite) and Gracupica contra (Asian Pied Starling), a pattern which largely held true across the datasets (Fig. 5).

Model results
The model assessment using all three datasets separately revealed comparable results, showing that the lowest AICc and highest weight was achieved by the model which included the variable vegetation fraction (Table 3). The summary of the top ranked models pointed to the positive effect of vegetation fraction for bird species richness in Dhaka. The explained variation in the selected model using open digital data was 69%, using field survey data it was 89% and using combined dataset it was 91%. The averaged model coefficients using separate datasets indicated vegetation fraction was a significant predictor of bird species richness in field surveyed data and combined data (Table 4). However, the significance of this predictor variable using the digital dataset differed from the results obtained using the field survey and combined datasets (Table 4). This variation in the significance level of the predictor might have resulted from the small sample size of well-surveyed locations.
Residual plots, the test of normality and heteroscedasticity in the selected top ranked models did not reveal any issues with    The contrast is shown using three separate datasets-open digital data, systematic field survey data, and a combined dataset (i.e. field data and open digital data are joined). Results shown here were produced using resident bird species richness (i.e. log10-transformed for normality) as the response variable. Top ranked models with lowest AIC c score and highest Akaike weight (w i ) are shown in bold.
Covariate: FR ¼ Frequency of observation records.
Suitability of open digital species records for urban ecology | 9 the various datasets. The trends in the fitted linear relationship of bird species richness to vegetation fraction were also similar across the top ranked models using separate datasets (Fig. 6).

Discussion
Enormous amounts of species observation records are openly available on various online platforms. Our study supports the notion that such digital data can be suitable for assessing patterns of bird species richness in relation to habitat factors in urban areas, and thus may provide a valuable basis for monitoring and planning of urban biodiversity. We assessed suitability of data from online platforms by comparing the variables retained after model selection based on open digital data versus systematic field survey data. Overall, field data as well as online data alike confirmed that, as expected, urban bird species richness was negatively related to increasing imperviousness and positively related to the proportion of vegetation (Evans et al. 2009;Silva et al. 2015;Mayorga et al. 2020;Sultana et al. 2021). Most importantly, the systematic field data and a combined dataset did not alter the general understanding of the key driving factors of bird species richness patterns we obtained using open digital data. This held for both our Western and our global South case example, Freiburg vs. Dhaka, respectively. This article focused on the urban context, the features of which are broadly represented by increasing levels of built-up areas and human settlement. We explored bird records for wellsurveyed urban sites (i.e. locations with sufficient species documentation) available on digital platforms, using an IC metric (Sousa-Baena et al. 2014;Troia and McManamay 2016). We found that the top ranked relationship of bird species richness to the key predictor variables retained using data of such wellsurveyed sites were not different from that using data collected through standard field survey. This indicates that if a sufficient amount of data of well-surveyed sites can be obtained from online platforms for a city, those data can be suitable for investigating driving factors of bird species richness patterns across its urban areas. Therefore, if field data are not available, biodiversity monitoring and planning may be initiated based on data from online platforms alone, until systematic field data become available. Note, however, that this notion does not necessarily hold for taxa and/or other indicators of biodiversity, such as species diversity or abundance of threatened species, which our study did not address.
In our study, the distribution of well-surveyed urban sites indicated gaps in the spatial coverage of bird observation records within both cities. In Freiburg, even though the city has more than 45 K digital bird records at Ornitho.de, not all locations are well surveyed; Fig. 1 shows poorly surveyed grid cells. The case of Dhaka is shown in Fig. 2, where the majority of the grid cells contain limited or no data; the small quantity of well-surveyed cells reflects the fact that GBIF holds less than 1.5 K avian records in this city. In general, spatial coverage of bird observation records is higher in urbanized areas than in forest/agricultural sites in both cities. Several earlier studies also indicated spatial and temporal gaps within open digital species databases (i.e. for single species or multiple taxonomic groups) at the scale of continents, countries and regions (Soberó n et al. 2007;Mora et al. 2008;Nakamura and Soberó n 2009;Sousa-Baena et al. 2014). Digital data usually covers a portion of the entire spectrum of environmental gradients in any geographic region (Hortal et al. 2007). However, species reporting by citizens was found to be more frequent in human-settled areas (Tye et al. 2017) and well-surveyed sites were found clustered around major cities (Freeman and Peterson 2019). Despite such bias, existing studies showed that web-based data collected by citizens can provide reliable predictions of speciesÀhabitat relationships (Tye et al. 2017). In an urban context, our results showed that the key driving factor of bird species richness through model assessment using open digital datasets, field datasets and combined datasets were similar in both cities. Our study case in Freiburg was a model example where plenty of species observation records and well-surveyed sites were readily attainable from open digital sources for reliable analysis. The additional systematic field survey data and a combined dataset did not alter the general understanding of the key driving factors of bird species richness patterns we obtained using open digital data. The general trends in top ranked relationship of bird species richness to urban habitat factors, using different datasets, were consistent with earlier studies indicating that high imperviousness usually causes resident bird decline (Blair 1996;Garaffa et al. 2009;Silva et al. 2015;Sultana et al. 2021).
The study case in Dhaka added another model example, where the highest ranked relationship of bird species richness through model assessments with different datasets was also similar. It highlighted the benefit of vegetation surface on bird species richness within urban environments (Chace and Walsh 2006;Donnelly and Marzluff 2006;Evans et al. 2009). However, one issue with the digital data in Dhaka concerned the low number of well-surveyed sampling sites represented in the online sources. A robust ecological assessment of bird species richness can be ambiguous with such insufficient amount of data. In this case, the additional data we collected through field surveys, albeit for a single season, enhanced the spatial coverage of bird records within the city. A combined dataset provided sufficient sampling sites and boosted a reliable assessment by showing higher explaining power.
Caution should be taken in interpreting our results. We investigated the consistency of avian observation records available from online platforms for assessing drivers of bird species richness patterns in an urban context only. Similar studies in forest/rural sites may be achievable if species reporting on online platforms is extensive (Tye et al. 2017;Speed et al. 2018). It is also important to note the influence of the spatial resolution at the grid cells. In both cities, we selected small-sized grid cells to compile geo-referenced species records to minimize the loss of site-specific information for the observed records (Tye et al. 2017). Large grid cells (>1 km 2 ) for assessment might be suitable for investigations focusing on large geographic regions (Hortal et al. 2007;Soberó n et al. 2007;Troia and McManamay 2016).
In addition to spatial bias, sampling-related issues can hinder the suitability of digital data in species richness prediction across any environmental gradient (Hortal et al. 2007;Zhang et al. 2019). Sampling issues can include variation in survey effort by observers (i.e. cause observer-related bias) and undersampling of certain species groups (i.e. cause taxonomic bias). It can further include higher survey attempts in easily accessible areas that are often close to the observer's residency, home institutes, near roads, human settlement, and in and around protected areas (Meyer et al. 2015;Tye et al. 2017;Freeman and Peterson 2019;Petersen et al. 2021). Thus, in a model assessment of bird species richness patterns in relation to different environmental predictors, an account for any such sampling efforts as covariates can be helpful to reduce associated bias within digital data (Warton et al. 2013;Tye et al. 2017). Such a modelling approach can help avoid the uncertainty in deriving diversity metrics from species level predictions, and sidestep the bias in prediction of richness patterns in relation to environmental predictors outside known communities (Zhang et al. 2019).
We observed that the generalist species with higher frequencies were similarly documented in both open digital data and field data (Figs 3 and 5). However, the number of all observed species was higher in digital data compared to the data we collected during field surveys in both cities. The higher number of species (i.e. including many less frequently observed species) in digital sources was largely due to the fact that digital data was collected over multiple years. We considered only resident birds' occurrences for the purpose of better comparison between different datasets. For conservation purposes, it would, however, be important to consider both resident and migratory species (breeding/non-breeding) in any locality. Additionally, it is recommended to cross-check digitally available species records with taxa-specific local experts and city bird atlas data (if available), to confirm the presence of unusual species and to account for any misidentifications. During data compilation, one should also keep abreast of taxonomic changes and species synonyms to avoid inclusion of duplicate records.
In urban settings, species-specific conservation is complex due to shortage of suitable habitat patches and various factors Suitability of open digital species records for urban ecology | 11 associated with human well-being (such as, provision of equal access to green spaces for better health benefits) (Dearborn and Kark 2010). Conservation efforts in human-developed areas can thus focus on overall species richness and prioritize steps to increase the presence of, e.g. rare and threatened species as much as possible. To collect reliable species richness data under any taxonomic class, a long-term consistent field survey would be desirable as the standard approach. However, this is time-consuming and relies on the availability of resources and expert investigators. Conversely, the number of online-based platforms (e.g. wildenachbarn.de, citynaturechallenge.org) to document species occurrences in cities has increased noticeably in the last decade, suggesting that digital data will continue to become increasingly available in the future. These species observation records, reported mostly by amateur citizens, can be a good fit in urban ecology since they are easily and rapidly accessible (van Heezik and Seddon 2017). We thus emphasize that our investigation on open digital data in Freiburg and Dhaka provides useful examples of the suitability of open digital data for assessing drivers of urban biodiversity patterns, and thus, its applicability in biodiversity monitoring and planning in cities.

Data availability
Digital species occurrence records of birds which were used for this assessment can be requested at 'Ornitho.de' for Freiburg, and can be accessed at GBIF (doi.org/10.15468/dl.beqzow) for Dhaka. Further species data and analysis script of this article can be found at the link: https://osf.io/yjmxq/?view_only¼ 3ec41ace94cf4d96808fec3733bf3be80