Use of herbarium data to evaluate weediness in five congeners

A weed or not a weed? Many plant species grow somewhere on the continuum from undisturbed to very disturbed vegetation. Deciding on the degree of weediness is not an easy task, and often based only on subjective observations. In this work, we compare data obtained during systematic field surveys with the habitats recorded on herbarium specimen labels, for a group of more-or-less weedy tropical species. We show that herbarium data reflect the collection bias favouring natural vegetation, but also, that the relative weediness hierarchy stays in place. The study is relevant for other ecological studies based on herbarium specimens.


Introduction
Weeds are plants that grow and form self-sustained populations in habitats strongly modified by humans (Harper 1944, cited in Baker 1965Radosevich et al. 1997;Rzedowski 2006). They are often unwanted, but can also play a major role in landscape ecology and aesthetics, or form part of agricultural production (Vieyra-Odilon and Vibrans 2001). Because of their numerous interactions with humans, their biology, ecology, evolution and communities are an important subject of study (Kuester et al. 2014). They overlap partly with invasive plants, understood as introduced species that cause problems, either in agriculture or in natural areas.
Weediness or synanthropy, that is the degree to which a species associates with anthropogenic habitats, is not a binary trait, but one with many intermediates. However, it is often discussed in the literature as if it were, and without providing quantitative support for the assertion. Also, ranking species on a relative scale may be useful for comparative studies of the biological traits tied to weediness or invasiveness, for example.
Previous attempts have classified species according to their association with disturbance types or other species. For example, weedy species may be categorized based on attributes such as the ability to establish new populations in sites where the vegetation cover was removed (Hill et al. 2002) or the proportion of annual or introduced species in the vicinity of the species of interest (Hill et al. 2002). Degree and kind of disturbance of the habitats occupied by weeds have been the focus of other efforts. The degree of association of a plant species with human settlements (Klotz and Kü hn 2002), or the average vegetation cover of urban soil associated with a species (Hill et al. 2002), can be documented. Frequently, these various metrics are calculated with the information obtained from raw data on species' occurrences available in public databases (e.g. data from phytosociological relevé s of the Braun-Blanquet tradition).
Recent years have seen a strong increase in the use of data derived from biological collections as a primary and verifiable source to address diverse environmental and ecological questions (Graham et al. 2004;Bolmgren and Lö nnberg 2005;Wu et al. 2005;Pyke and Ehrlich 2010;Robbirt et al. 2011;Lavoie 2013). Information on species habitat can be obtained from collections or from data obtained directly in the field. Both types of data have potential shortcomings.
When working with collections, it is important that identifications are corroborated and that labels are checked to identify obvious mistakes. Also, collections are often biased: biologists tend to collect in less disturbed, but accessible areas (Rich and Woodruff 1992;Kadmon et al. 2004;Crawford and Hoagland 2009;Kramer-Schadt et al. 2013); prefer rare over common species (Guralnick and Van Cleve 2005; Garcillán and Ezcurra 2011) and do field work during holidays and vacations (Pyke and Ehrlich 2010).
Field data are generally more accurate, but gathering them is time consuming, costly and usually limited in space and time. Guiding fieldwork with predictive modelling can help ameliorate these issues and make the process more efficient (Phillips et al. 2006;Mateo et al. 2011;Wegier et al. 2011;Gil and Lobo 2012). There are surprisingly few studies comparing the two types of data directly (e.g. Garcillá n and Ezcurra 2011), and none for the group of species known as weeds.
When detailed data on the associated vegetation or disturbance type are not directly available, other sources have to provide data for the indicators. For example, the association of species with general habitat types rated by disturbance level, and the proportion of records per species and habitat can be calculated. The anthropogenic habitats populated by weeds vary in type, intensity and frequency of disturbance (Šilc 2010). One classification of weeds distinguishes plants that have to adapt to regular episodes of soil disturbance and removal of all aboveground biomass (agrestal weeds) from those that do not or for which disturbance is irregular (ruderal weeds) (Holzner 1978(Holzner , 1982. For ruderal weeds, which often grow in or near human settlements, mowing, grazing or trampling may be the crucial disturbance types. Because soil and roots are not regularly disturbed, ruderal habitats are, on average, less stressful than agrestal sites. Here, we assess whether herbarium specimens provide accurate weediness or synanthropy rankings, compared with field data. As a case study, we used five species of the genus Melampodium (Asteraceae) with differing degrees of weediness, judging from preliminary observations and data from floras. To assess weediness, we used a modified Nuorteva (1963) synanthropy index (SI) for insects, a metric that reflects the relative frequency with which a species is found in habitats disturbed to varying degrees. We followed Hart (1976) in deriving data from herbarium specimens. Species distribution models guided our field surveys; we searched for and documented populations and their habitats during peak flowering time. The modified Nourteva index was calculated for every species, first for the herbarium records, then for the field data. If herbarium data reflect relative weediness despite collection bias, then data obtained from specimens are a more accessible data source for assessing this characteristic than direct field surveys.

The genus Melampodium
Melampodium (Asteraceae) is a genus of 40 species, most of them annual, originally restricted to the tropical and subtropical regions of Mexico and Central America (Stuessy 1972;Blö ch et al. 2009). The taxonomy and phylogeny of Melampodium are well known (Robinson 1901;Stuessy 1971Stuessy , 1972Stuessy , 1979Stuessy et al. 2004;Blö ch et al. 2009;Weiss-Schneeweiss et al. 2009;Blö ch 2010). Melampodium is an appropriate system for the study of weediness as it has been widely collected in Mexico, and expert taxonomists specializing in the genus or the family (T. Stuessy and J.L.V., respectively) have examined the specimens. Moreover, ruderal populations are known for all species of the genus and agrestal populations for a third of the species. All but one species are annual; Melampodium americanum is considered perennial though its populations in Nayarit show a tendency towards an annual habit (Stuessy 1972). Phylogeny, karyotypes and allopolyploidy events are well documented (Stuessy 1971(Stuessy , 1972(Stuessy , 1979Stuessy et al. 2004;Blö ch et al. 2009;Weiss-Schneeweiss et al. 2009;Blö ch 2010).

The study area
We limited our field study to the state of Nayarit, on the western coast of Mexico. This state is located at the confluence of four Mexican physiographic regions: the Trans-Mexican Volcanic Belt, the western and the southern Sierra Madre mountain chains and the north-western coastal plain. It, thus, has a highly diverse flora (3428 species) that assembles into many vegetation types (Téllez 1995;Villaseñ or 2003). Fourteen species of Melampodium have been reported for Nayarit (Stuessy 1972;McVaugh 1984;Villaseñ or and Espinosa-García 1988;Té llez 1995;Ortiz et al. 1998), representing over half of the sections in the genus recognized by Stuessy (1972) and Blö ch (2010). Also, over 60 % of the known range of the narrowly distributed M. tepicense is found in the state of Nayarit. Documented human settlements for the state date to prehispanic times (Anguiano 1992), allowing for varied associations between plants and human activities.

Species selection and overview
Based on herbarium records and a preliminary field survey in October and November of 2011, we selected 5 species of Melampodium, all native, of the 14 species reported for Nayarit (Stuessy 1972;Ortiz et al. 1998): M. americanum, M. divaricatum, M. microcephalum, M. perfoliatum and M. tepicense. We aimed to cover as wide a range of weediness as possible. The general distribution, habitat information and phenology of the species included in this study and reported in the literature are shown in Supporting Information- Table S1.
We obtained the herbarium data from a national database maintained and curated by J.L.V., Asteraceae specialist and one of the authors of this study. It includes the herbarium specimens available at the National Herbarium (MEXU), with a few other additions. The database contained 1562 records with geographic coordinates for our five focal species. We modelled the potential distribution for each species using records from all of Mexico. By including data for as wide a portion of the species ranges as possible (and not only from the state of Nayarit; see below for the description of our distribution modelling approach), we hoped to capture more accurately the conditions under which the species occur, and thus increase the predictive power of our distribution models. Then, using our species distribution models as a guide for Nayarit, we conducted field surveys to locate populations and record population habitat at peak flowering time.

Herbarium specimen data and habitat categories
We examined all specimens of our five focal species of Melampodium available at MEXU to verify their identification and information on their habitat. Habitat information was only available for 543 records, all of which we included in our study. The habitat data on the herbarium labels were assigned to three categories: (i) 'natural vegetation', if the reported habitat consisted of natural vegetation, even if disturbed, unless the plant was clearly part of secondary vegetation; (ii) 'ruderal vegetation', if the label described the habitat as 'roadside', 'railroads', 'field margins', 'football fields', 'parking lots', 'old fields', 'pastures', 'secondary grassland' or more generally, secondary vegetation and (iii) 'agrestal vegetation', if the collection was from cultivated fields, plantations or gardens.

Distribution modelling
The national database contained 1562 records with geographic coordinates of our five focal species. Of these, 110 were of M. americanum, 935 of M. divaricatum, 142 of M. microcephalum, 347 of M. perfoliatum and 28 of M. tepicense. We randomly selected 75 % of these records to model the potential distribution of each species and set aside the remaining 25 % for verification of our models. We built our species distribution models with the program MaxEnt v. 3.3.3e (Phillips et al. 2006; http://www.cs. princeton.edu/~schapire/maxent/). As predictor variables, we used the 19 climatic variables and a digital elevation model available in WorldClim [see Supporting Information- Table S2; Hijmans et al. 2005; http:// www.worldclim.org/bioclim.htm]. All variables had a spatial resolution of 1 km 2 . The distribution models were transformed into binary presence -absence maps using a 10 % omission error to determine the cut-off using Arc-Map from ArcGIS (Environmental Systems Research Institute, Inc., New York).
Because field surveys are time consuming and expensive, we limited ours to an area of high predicted Melampodium diversity in Nayarit. To identify such an area, we first overlaid the presence -absence maps of all our AoB PLANTS www.aobplants.oxfordjournals.org focal Melampodium species for all of Mexico and identified areas where finding the five species was highly probable. Our species distribution approach yielded two main areas of high predicted diversity (Fig. 1). The largest one, on the western side of the Trans-Mexican Volcanic Belt, included the states of Nayarit, Jalisco, Colima and Michoacá n. The second, smaller area was located in the southern Sierra Madre del Sur, which extends towards the Sierra Madre of Chiapas. We selected the polygon with the highest predicted diversity in the state of Nayarit as the area for our field surveys. This polygon had an area of 3174 km 2 and was located within a region called the 'Altiplanicie Nayarita' (Anguiano 1992) that includes the main volcanos and intermontane valleys of the Trans-Mexican Volcanic Belt in the state.
The natural vegetation in the selected polygon consists of mixed montane oak and pine-oak forests, including some cloud forest in protected locations. There are some disturbed relicts of semi-evergreen tropical forests on the lower slopes, especially those facing north or west, and tropical dry forests cover the valleys in the eastern and southern portions (Fig. 1D). Because of their highly  (2013); it also shows the roads along which the survey was conducted. The abbreviations for the vegetation types are as follows: CF, cloud forest or tropical humid mountain forest; PF, pine (Pinus) forest; PQF, pine-oak forest; QF, oak (Quercus) forest; TDF, tropical dry (or deciduous) forest; TSHF, tropical subhumid forest; SV/PQF, secondary vegetation derived from pine-oak forest; SV/QF, secondary vegetation derived from oak forest; SV/TDF, secondary vegetation derived from tropical dry forest; SV/TSDF, secondary vegetation derived from tropical semi-dry (or semi-deciduous) forest; SV/ TSHF, secondary vegetation derived from tropical subhumid forest; P, induced grassland; A, agricultural land. fertile soils, most of the valley area and foothills have a long history of agricultural use.

Field data
We surveyed the area selected from our modelling approach (3174 km 2 ) along the primary and secondary roads shown in Fig. 1D, from August to October 2012, the main flowering season. On 15 field days, all of the roads were slowly travelled three times at intervals. As the roads generally run along valleys, the populations on the hillsides with their bright yellow flowers are easily located from the roads. We visited all visible populations as well as sites that, based on previous information, were likely to host populations of Melampodium. In places with natural vegetation (see Fig. 1D) and in cultivated areas, we stopped and walked further from the road (a few hundred metres) to find populations. For every population found, we recorded species, habitat type and coordinates.
The data of our field survey were comparable with the data obtained from herbarium specimens, even though they were collected specifically along roads. A map of the processed herbarium specimens of Melampodium from central Mexico [see Supporting Information- Fig. S1] shows clearly that few of these collections were made away from roads.

Synanthropy index
To assess the plant's weediness, or ability to grow in habitats with varying degrees of association with anthropogenic environments (Hart 1976), we used a SI that takes into account three levels of disturbance: low (natural vegetation), intermediate (ruderal vegetation) and high (agrestal vegetation). We defined our SI as follows: SI ¼ 3x + 2y + z, where x, y and z correspond to the fraction of the total number of individuals of a given species collected in agrestal, ruderal and natural sites, respectively. This newly defined index ranges from 1 to 3, with 3 representing the maximum association with habitats transformed by human activity.
Our SI is based on Nuorteva's index (1963), which is widely used to evaluate degree of insect association with urban, rural or natural habitats (Bueno Marí and Jiménez-Peydró 2011; Barata et al. 2012;Beltran et al. 2012;De Souza and Zuben 2012;Ekanem et al. 2013;Yepes-Gaurisas et al. 2013). Nuorteva's original index is calculated with the formula SI Nuorteva ¼ (1/2)(2a + b 2 2c), where a, b and c are percentages of collections or captures in urban, rural and natural environments, respectively; values range from 2100 to +100. This index is based on percentages, and the categories are weighted differentially. In contrast, our SI is based on proportions. As we considered ruderal vegetation, particularly in rural areas, to be intermediate between agrestals and natural vegetation, we gave it an intermediate weight.

Statistical analyses
We first assessed the association between species and habitat category with a x 2 test. Then, to rule out the possibility that the observed association of a given species with a particular habitat type was the result of stochastic processes or sampling error, we conducted randomization tests. For these tests, we used only the data derived from herbarium records. The reason for excluding the field data was that the number of records was insufficient for most species, except M. divaricatum.
We compared the observed value for each habitat of a given species against a null distribution consisting of 1000 random samples as follows: from the original pool of 543 records, by sampling with replacement, we generated a distribution of 1000 datasets of a size equal to the number of accessions for a given species (i.e. for M. americanum, we sampled 79 records 1000 times; for M. divaricatum, 279; for M. microcephalum, 78; for M. perfoliatum, 92 and for M. tepicence, 15). We then tallied the habitat occurrence per sample (i.e. how many records per sample belong to each habitat category), thus generating null distributions for each habitat type per species. For every species, we compared the observed value for a specific habitat category against the null distributions for the habitat categories. If the observed values of habitat categories for the different species were merely an effect of sampling, we would expect that the observed value for all species/habitat combinations to fall within the 95 % confidence intervals (CI) of their respective resampling distributions. If, however, our observed associations between species and habitats reflect biological associations and not sampling error, we would expect the observed values to fall outside the 95 % CI of the resampling distributions of species-habitat combinations.
Over 50 % of the total records belonged to M. divaricatum. To investigate the extent to which the observed patterns were driven by this species, we repeated our resampling analyses excluding the records of M. divaricatum.
We also explored the possibility that the SI values were influenced by sample size. For this, we generated another set of null distributions (size 1000 again) for each species, this time resampling only 25, 50 or 75 % of the data, prior to re-calculating the SI for each species. Again, if SI calculations were independent of sample size, we would expect to obtain the same relative ranking for each of the species. For computational reasons, we calculated only 100 SI. All resampling analyses were done in R (R Development Core Team 2014; http/www.r-project.org/; code available upon request).

Herbarium data
Four species were found in all three types of habitat (agrestal, ruderal and natural), in varying proportions (Table 1). The fifth species, M. tepicense, has not been reported as an agrestal weed in the literature, and we did not find any specimens from cultivated fields either. More than half of the specimens belonged to M. divaricatum and only 3 % to M. tepicense. The frequencies of the other three species were similar (M. perfoliatum: 17 %, M. americanum and M. microcephalum: 14 % each). Most specimens had been collected in natural vegetation (45 %), 41 % in ruderal habitats and only 14 % as agrestal plants.

Field data
We found 173 populations of four of the five species in the area predicted by models based on climatic data (Table 1). We were not able to locate populations of M. americanum; this species has its northern limit in the state and is known from only two collections there. In contrast to what herbarium data would suggest, most populations were ruderal (92 %), 5 % were agrestal and 3 % grew in natural vegetation. The only species found in a variety of cultivated fields and plantations (including maize, avocado, roselle-Hibiscus sabdariffa, green beans and lime) was M. divaricatum. In contrast, M. microcephalum and M. tepicense were mostly found in tropical dry forest and semi-evergreen tropical forest, respectively. As with herbarium data, most of the populations belonged to M. divaricatum (86 %) and very few to M. tepicense (2 %), although M. microcephalum and M. perfoliatum were also low in numbers (8 and 4 %, respectively).

Synanthropy index and species-habitat associations
Species of Melampodium varied in their degree of weediness, as was reflected by their SI values ( Table 2). As expected, M. divaricatum had the highest SI values (SI FIELD ¼ 2.06, SI HERBARIUM ¼ 1.76), and M. tepicense the lowest (SI FIELD ¼ 1.67, SI HERBARIUM ¼ 1.13). Values of SI obtained from field data tended to be higher than those derived from herbarium data, but the relative ranking of species was conserved.
A x 2 independence test for both the herbarium (x 2 = 24.59 . x 2 0.05,8 = 15.51) and the field data (x 2 = 48.49 . x 2 0.05,6 = 12.59) suggested that the species differed in the kind of habitat they occupy. We infer that this relationship probably reflects a biological and not a stochastic phenomenon, as most of the observed SI values, with the exception of those for M. microcephalum and M. perfoliatum as agrestals, fell outside the 95 % CI derived from randomized datasets (Table 3, Fig. 2   Excluding M. divaricatum from our analyses did not shift patterns in species' null distributions [see Supporting Information-Diagram S1]. These results suggest that the patterns we observed were not driven by M. divaricatum, the species with most herbarium records and populations. The only exception was M. perfoliatum, which appeared to be slightly weedier when we excluded M. divaricatum. The species rankings were robust against a reduction of the sample size, as their relative ranks were maintained when randomly resampling from subsequently smaller datasets. Melampodium divaricatum, M. microcephalum and M. tepicense preserved their first, fourth and fifth places, respectively (Fig. 3); M. perfoliatum and M. americanum occupied intermediate ranks, sometimes interchanging second and third place.

Discussion
Our field validation of the relative weediness of the species confirmed, with slight variations, the ranking initially calculated from herbarium data (and the usefulness of distribution modelling). Thus, our results suggest that herbarium data are an appropriate resource for ranking species by their degree of association with human activities, as reflected by synanthropy indices. This is further evidence (Lavoie 2013) that biological collections are important tools for obtaining ecological data. This is the first time herbarium and field data for weed habitat are compared directly and quantitatively, and one of the few comparisons for any group of plants. The only other example we found, Garcillá n and Ezcurra (2011), evaluated herbarium and direct field data from vegetation plots for an island, in order to calculate their rarefaction curves (and coverage of rare species). They found that collections over-represent rare species, but due to the same collection bias, also deliver more complete species lists.
Their results are compatible with ours: the herbarium data yielded consistently lower SI values than field data. Melampodium divaricatum, the most widespread and abundant species (Turner and King 1962;Stuessy 1979), represented 51 % of the herbarium specimens, but 86 % of the populations found in the field. This confirms previous findings that collectors do not collect common species in proportion to their presence (Guralnick and Van Cleve 2005;Garcillá n and Ezcurra 2011). The data also agree with previous observations suggesting that biologists tend to collect in easily accessible places with natural vegetation and often avoid secondary vegetation (Rich and Woodruff 1992;Kadmon et al. 2004;Pyke and Ehrlich 2010;Kramer-Schadt et al. 2013).
Our field observations indicated that Melampodium species were basically ruderal. However, label data from herbarium specimens often lack data on the microhabitat where plants really grow. Misrepresentation of habitat in our data remains a possibility. For example, a label may indicate a forested area, but the plants might have been actually growing on the side of the path or in a clearing, in a more ruderal setting. However, we have no reason to suspect that such biases affected species differentially in a systematic way.
A shortcoming of studies based on herbarium or database records is the potential for misidentifications and errors in the label information. In our case, these sources of error should be low, as we put considerable effort into curating both collections and database information, the taxonomy and phylogeny of Melampodium are well known and all specimens had been recently examined and annotated by a specialist in the genus or family. However, this should be kept in mind when using uncorroborated data from, for example, public databases.
As a final note, we would like to point out that the relative ranking of Melampodium species based on their SI values   2008), as has the relationship between some other features characterizing weediness (e.g. early and extended flowering, in some cases, the annual life span; the 'general purpose genotype') and invasiveness (Rejmá nek 2000;Pyšek and Richardson 2007).

Conclusions
Herbarium specimens reflect the weediness of species relative to each other. Calculating an index from herbarium or database records is an economical first step for identifying weeds or species with weedy characteristics and ranking them in relative order, which is useful for comparative studies on the biology and ecology of a group of plants. However, collection data do not lead to absolute rankings of weediness or synanthropy, due to collection bias. Detailed field studies are necessary for obtaining definite values for this ecological trait. Finally, our work also shows the utility of species distribution modelling as a tool for guiding field surveys.

Sources of Funding
The work of the first author was supported by the CONACyT (Mexico) project NAYARIT-2008-C 01-93389 ('Fortalecimiento y consolidación del Doctorado en Ciencias Biológicas y Agropecuarias con é nfasis en el á rea de ciencias ambientales de la Universidad Autónoma de Nayarit').

Supporting Information
The following additional information is available in the online version of this article - Table S1. General distribution, habitat and phenology of the five species of Melampodium selected for this study, based on Stuessy (1972) and Robinson (1901). Table S2. Environmental variables used as predictors of the model of the potential distribution. Figure S1. A map of the general collections of the five Melampodium species of this study, previous to this work, with major roads. It shows that the large majority of the collections were made along roadsides.
Diagram S1. Histograms of 1000 null distributions generated by resampling the specimen data in R (Core Development Team), and excluding Melampodium divaricatum.