Species distribution modelling for conservation of an endangered endemic orchid

Navasota ladies'-tresses is an orchid native to eastern and central Texas. It was listed as endangered by the U.S. Fish and Wildlife Service in 1982 and by the State of Texas soon afterwards. Wang et al. (2015) analyzed field data collected over nine years to identify areas of critical habitat, areas into which the species could expand its range, and areas that might serve as conservation corridors. These results will provide valuable information for those interested in conservation and management of this endangered orchid as well as a framework for the development of future studies of this and other endangered plants.


Introduction
Conservation biologists and natural resource managers are growing increasingly concerned about the manner in which climate change and accelerating habitat fragmentation may negatively affect the long-term viability of threatened and endangered plant species (Wilcove et al. 1998;Pitman and Jorgensen 2002;Brigham and Swartz 2003). To successfully prevent their extirpation, conservation efforts will require detailed studies of species population biology and life-history dynamics, more thorough assessments of the factors contributing to rarity, sophisticated land management and restoration strategies and the development of more robust predictive models that better identify both high-priority conservation locations as well as areas potentially suitable for plant reintroductions (e.g. Falk and Holsinger 1991; Schemske et al. 1994;Maschinski and Haskins 2012).
Orchidaceae is the largest and most diverse family of flowering plants, but it is currently facing unprecedented risks of extinction (Cribb et al. 2003;Swarts and Dixon 2009). Orchidaceae consists of over 1000 genera and most orchid genera contain one or more threatened or endangered species (Cribb et al. 2003). The majority of threatened orchid species are terrestrial orchids, despite the small portion of the family represented by this life form (IUCN 2001). In addition, many terrestrial orchids are rare, with specialized habitat requirements, making them particularly susceptible to habitat fragmentation and modification (Wu and Smeins 2000;Pillon and Chase 2007). Their vulnerability is exacerbated by patchy distributions, specialized mutualisms and generally limited dispersal (Schemske et al. 1994;Coates et al. 2006;Pillon and Chase 2007). Given the high extinction risk to terrestrial orchids, they have been a major conservation concern for many environmental groups. They have often been used as flagship species in conservation initiatives because of their uniqueness and rarity and additionally are often touted as important early warning bioindicators for ecosystem health given their sensitivity to environmental degradation (Cribb et al. 2003;Swarts and Dixon 2009).
Despite substantial conservation emphasis on rare orchids, populations continue to decline (Swarts and Dixon 2009). This is in no small part due to difficulties in designing integrated conservation plans for orchid protection (Whigham and Willems 2003;Swarts et al. 2007). Often, small patchily distributed orchid populations are difficult to detect without thorough surveys of extensive areas, which are often not logistically feasible (Wu and Smeins 2000;Gogol-Prokurat 2011). In addition, many populations are spread across a network of private or otherwise inaccessible land. Incomplete censuses of populations limit the ability for conservation planners to determine appropriate areas for protected habitat and assisted migrations (Cuperus et al. 1999;Wan et al. 2014). Effective conservation planning requires the identification of areas of suitable habitat in order to facilitate prioritization and appropriately identify land for the creation of preserves or easements and mitigation for habitat modification or loss (Rodríguez et al. 2007;Gogol-Prokurat 2011;Dudley and Bean 2012).
Species distribution models are invaluable tools for focussing conservation efforts of species with incomplete distribution records (Fleishman et al. 2002;Buse et al. 2007;Fandohan et al. 2011;Gogol-Prokurat 2011). Species distribution models comprise a suite of quantitative tools that statistically relate species presence and absence data to environmental predictor variables (Guisan and Thuiller 2005;Gogol-Prokurat 2011). They elucidate habitat requirements, aiding in the development of distribution predictions essential to meeting endangered species conservation objectives with limited site occupancy data and resources for additional data collection (Guisan and Thuiller 2005;Hirzel et al. 2006). They can be used to identify habitat suitable for conservation by providing maps of probabilities that the species would occur in a given area (Ibisch et al. 2002;Jiménez-Valverde and Lobo 2007), determine the effects of land-use change on endangered species habitat (Rodríguez et al. 2007) and explore the effect of global change on endangered species distributions (Jiménez-Valverde and Lobo 2007;Thuiller et al. 2008).
Studies involving species distribution modelling have increased in recent years and several methods are currently applied to address ecological issues ). These include statistical models such as generalized linear models (Wang and Grant 2012) and generalized additive models (Leathwick et al. 2006), machine-learning models such as CLIMEX (Pattison and Mack 2008), GARP (Stockman et al. 2006) and Maxent (Wilting et al. 2010), as well as methods drawing on insights and techniques from statistical and machine learning approaches such as random forests (Prasad et al. 2006) and boosted regression trees (Chiou et al. 2013). Boosted regression trees are a relatively new method compared to others. Boosted regression trees have their origins within machine learning, but subsequent developments in the statistical community have led to a reinterpretation of boosted regression trees as an advanced form of regression (Elith et al. 2008). However, boosted regression trees differ fundamentally from statistical methods and machine-learning approaches such as Maxent that produce a single 'best' model in that boosted regression trees combine a large number of relatively simple tree models adaptively to optimize predictive power (Leathwick et al. 2006 tree (a rule-based classifier) that partitions observations into groups having similar values for the response variable based on a series of binary rules constructed from the independent variables (Hastie et al. 2001). Boosted regression trees have been used to predict the distribution of a threatened species, rabbitsfoot (Quadrula cylindrical), via the inclusion of independent variables measured at markedly different spatial scales (Hopkins 2009).
In this study, we analysed the relationships between the occurrence of Spiranthes parksii, an endangered terrestrial orchid, and several climatic and landscape variables deemed important to S. parksii distribution. Spiranthes parksii is a state and federally listed endangered orchid, endemic to central Texas, USA. Its distribution is limited to 13 counties and it appears to occupy a restricted habitat within those counties, often observed on the edges of upland drainages in small open grass/ shrub patches within post-oak savannah/woodland communities (Wonkka et al. 2012). Spiranthes parksii populations are threatened by land conversion to agriculture and lignite mines, urban development and woody encroachment by trees and understory shrubs into post-oak savannahs (Wonkka et al. 2012). Like many terrestrial orchids, S. parksii is a mycoheterotroph, requiring a mycorrhizal symbiont for germination and seedling development, remaining closely associated with the fungi throughout its life cycle (Ariza 2013). This specialized mutualism, along with a limited dispersal shadow, and specialized habitat requirements leads to a patchy distribution, making S. parksii detection difficult. In addition, the counties in which the populations are located consist largely of privately owned land, inaccessible to surveyors. High rates of development in this region necessitate effective prioritization of lands for mitigation efforts. Given the conservation concerns and limited availability of survey data for this species, we developed a species distribution model to aid in effective conservation of S. parksii. In particular, we used boosted regression trees to (i) identify potential factors influencing S. parksii distribution, (ii) quantify the relative importance of each factor and (iii) predict suitable S. parksii habitat. The model developed herein will provide an adaptive quantitative tool which can be used to facilitate future S. parksii surveying, research and conservation efforts and, with slight modification, should be applicable to other endangered species with similarly limited ranges.

Study area and data sources
The study area covers several counties (Bastrop, Fayette, Milam, Freestone, Leon, Madison, Grimes, Robertson and Brazos) in central Texas, USA (Fig. 1). The area is largely post-oak savannah intermixed with open grassland, cropland and urban and suburban development. The climate is humid subtropical with an average minimum temperature of 14 8C, an average maximum temperature of 26 8C and average annual precipitation of 105 cm bimodally distributed with peaks in the fall and the spring.
We obtained geo-referenced data on (i) presence and absence of S. parksii regularly sampled between 2004 and 2012 from the US Fish and Wildlife Service, Texas Parks and Wildlife Department, Texas Department of Transportation and the Texas A&M University team working under Drs Smeins and Rogers, (ii) average climatic conditions in our study area from the PRISM Climate group (2013) and (iii) landscape features including topographic characteristics derived from digital elevation models (Gesch 2007), land cover (Fry et al. 2011), soil characteristics (Soil Data Mart 2013) and geology (Stoeser et al. 2013). Spiranthes parksii surveys were conducted yearly during peak flowering for the duration of the study with systematic sampling across areas of known S. parksii occupancy and additional areas deemed likely to contain S. parksii due to similarity in habitat characteristics to known areas of occupancy. Spiranthes parksii locations were marked with GPS coordinates for future observation.
Our geo-referenced data for S. parksii were primarily comprised of occurrences. Pseudo-absences, or random points in the study area where the focal species has not been documented, are typically used as a surrogate for absence records in studies with such data limitations. However, there may be limited confidence in absence at those points, depending on the sampling strategy (Phillips et al. 2009). As a more definitive sample of absence points, we used locality records for an ecologically similar congener, S. cernua. Given the similar ecology and phenology of the two species (Ariza 2013), and the conservation status of S. parksii, S. parksii would have been recorded if found during surveys for S. cernua. Thus, we confidently use records for S. cernua that lack concurrent records for S. parksii.

Data analysis
We selected 58 variables that have been suggested in the literature as potential predictors of the presence of S. parksii in central Texas (Wonkka et al. 2012;Krupnick et al. 2013), including various climatic conditions and landscape features [see Supporting Information]. We analysed relationships between the occurrence of S. parksii and the potential explanatory variables by aggregating the explanatory variable data associated with S. parksii presence (106 cells) and absence (99 cells) into polygons representing a resolution of 800 × 800 m cells, aligned with the climate data that we used, AoB PLANTS www.aobplants.oxfordjournals.org in central Texas. We then merged these data into a grid of 37 427 800 × 800 m cells using ArcGIS 9.0 (ESRI 2009). The climate data included annual average maximum and minimum temperatures, and precipitation for 1981 -2010 (PRISM Climate Group 2013). We derived topographic characteristics for the 800 m grid cells of analysis from the 30 m National Elevation Dataset (Gesch 2007) using SAGA GIS version 2.1.0 (www. sagagis.org). We also calculated the average soil waterholding capacity, percentage of silt, sand and clay in each soil type based on STATSGO soil data (Soil Data Mart 2013) using R version 3.0.2 (R Core Team 2013). We used spatial overlay tools in SAGA GIS version 2.1.0 and Manifold GIS version 8.0.28 to aggregate the various data layers [see Supporting Information] into a single dataset for analyses.
We conducted our analysis using boosted regression trees which combine decision trees and a boosting algorithm with a form of logistic regression (Friedman 2002;De'ath 2007;Elith et al. 2008). For boosted regression trees, the probability (P) of S. parksii occurrence (y ¼ 1) at a location with the potential explanatory variables (X ) is given by P(y ¼ 1|X ) and is modelled via the logit: logit P(y ¼ 1|X ) ¼ f (X ). We fitted our model in R (R Development Core Team 2006 version 2.14.1) using the gbm package version 1.5-7 (Ridgeway 2006) and code provided by Elith et al. (2008). The optimal model was determined following the recommendations of Elith et al. (2008) by altering the learning rate and tree complexity (the number of split nodes in a tree) until the predictive deviance was minimized without over-fitting, and by limiting our choice of the final model to those that contained at least 1000 trees (where each successive tree is built for the prediction residuals of the preceding tree). Once the optimal combination of learning rate and tree complexity was found, model performance was evaluated using a 10-fold cross-validation procedure with resubstitution. For each cross-validation trial, 80 % of the dataset was randomly selected for model fitting and the excluded 20 % was used for testing. We calculated the response variance explained, the area under the receiver operator characteristic curve (AUC), the overall accuracy, the omission error rate and the commission error rate based on the aggregated CV results. We evaluated the reliability and validity of our models as fair  2000). We then used the gbm library to derive the relative influence of each potential explanatory variable in the model and constructed partial dependence plots for the most influential variables (Elith et al. 2008). Finally, we used this optimal model to calculate probability of S. parksii presence in each cell in central Texas and superimposed these probabilities of occupancy on a map of the study area using ArcMap 9.0 (ESRI 2009).

Results
Analyses of 500 combinations of tree complexity (ranging from 3 to 7) and learning rate (ranging from 0.001 to 0.01) produced models with between 450 and 3900 trees whose values of predictive deviance ranged from 0.582 to 0.624. The optimal model had a tree complexity of 5, a learning rate of 0.003 and a total of 1200 trees. Model predictive deviance was 0.582 + 0.0079 with 95.6 % of the total response variance explained. The AUC score was 0.940 + 0.016 ('very good' ability to discriminate between species presence and absence) and the overall accuracy was 91.7 %. The commission (false positive) error rate was 6.8 % and the omission (false negative) error rate was 9.8 %. Recursive feature elimination tests showed that 45 variables could be removed from the model before the resulting predictive deviance exceeded the initial predictive deviance of the model with all variables.
Thirteen variables were included in the final model (Table 1), with variables associated with climatic conditions and landscape features accounting for 53.5 and 46.5 %, respectively, of the contribution in the overall model (Fig. 2). Examination of the relative contribution of the predictor variables indicated that the top four accounted for 70.95 % of the contribution in the overall model. Of the four most influential model variables, three were climatic conditions and one was a landscape feature. Mean annual precipitation, mean annual minimum temperature and mean annual maximum temperature were the first, third and fourth most influential variables, contributing 26.93, 17.26 and 9.31 %, respectively. Mean elevation was the second most important variable contributing 17.45 %.
Partial dependence plots indicated that S. parksii occurrences were associated with climatic conditions characterized by mean annual precipitation between 1050 and 1120 mm (Fig. 3A), mean annual minimum temperature between 13.75 and 14.38 8C (Fig. 3C) and mean annual maximum temperature between 26.00 and 26.25 8C (Fig. 3D). Occurrences also were associated with landscape features characterized by (i) an altitude between 50 and 80 m (Fig. 3B), (ii) a slope ratio between 0.05 and 0.09 % (Fig. 3H), (iii) areas with ,20 % pasture (Fig. 3E), 20-73 % of evergreen forest (Fig. 3I), 50 -73 % of deciduous forest (Fig. 3J), or ,5 % of developed open space (Fig. 3K), (iv) soil with ,20 % clay (Fig. 3F)   Our analyses suggest that potential habitat for S. parksii in central Texas, considering its association with the variables mentioned in the previous paragraph, is most likely to be (i) the eastern portions of Leon and Madison Counties, (ii) the southern portion of Brazos County, (iii) a portion of northern Grimes County and (iv) along the borders between Burleson and Washington Counties (Fig. 4).

Discussion
Plant distributions are limited by the availability of suitable habitats (Aitken et al. 2007). For rare plants, especially those with limited geographic ranges, narrow habitat specificity can further limit distribution. While climate is an important determinant of plant distribution at landscape levels (Pearson et al. 2004), soil properties and biotic interactions determine habitat availability at local scales (Raven 2002). Even for edaphic endemics, combinations of variables predict distributions more accurately than simple models driven entirely by soil-related parameters (Arundel 2005). This is especially valid for predicting orchid species distributions which are highly dependent on interactions with pollinators and mycorrhizal symbionts (Rasmussen 2002;Rasmussen and Rasmussen 2009).
Our model establishes the importance of both climatic variables and landscape features to the distribution of S. parksii. Spiranthes parksii is associated with the higher   Table 1 for the description of variables). end of the range of average annual precipitation for the area (1050 -1120 mm). This agrees with findings by Ariza (2013) that showed higher soil moisture as a major explanatory variable differentiating S. parksii occurrence with the more abundant sympatric species S. cernua. Spiranthes parksii is also found in areas with high minimum (13.8-14.4 8C) and maximum (26-26.3 8C) mean annual temperatures. This likely contributes to S. parksii distribution through unique life-history characteristics including summer dormancy and potential early fall emergence of rosettes (Wonkka et al. 2012). Summer dormant plants existing below the soil as rhizomes can withstand high peak temperatures, but with above-ground photosynthetic vegetation being present in the winter, the plants favour areas with higher winter temperatures to minimize potential frost damage.
Given the importance of climatic variables to S. parksii distribution, climate change could have an extensive impact on the availability of suitable S. parksii habitat in the future. Climate change has been shown to cause distribution shifts for many species of plants and can increase the likelihood of local extinction as sessile plant species are unable to disperse or adapt to a rapidly changing climate (Hoegh-Guldberg et al. 2008;Nicolè et al. 2011;Parmesan et al. 2013). This is further exacerbated in highly fragmented areas, such as the range of S. parksii, where human alteration can act as a barrier to dispersal processes (Preston et al. 2008). Although there is considerable uncertainty in predictions of future temperatures and precipitation for a particular region, projections averaged across ensemble models suggest increased summer and winter temperatures and decreased average annual precipitation across the southern United States (Deser et al. 2012). In addition, precipitation is expected to become more variable, with more frequent drought events and more precipitation occurring in fewer rainfall events (Coumou and Rahmstorf 2012;Deser et al. 2012; Intergovernmental Panel on Climate Change 2014). While warmer temperatures could increase the habitat available to S. parksii, there is likely an upper bound on temperatures that the orchid can withstand. In addition, reduced precipitation and greater frequency of drought could cause many currently suitable areas of habitat to become too dry to support populations of the orchid given that our model shows S. parksii to occur in the wetter portions of its range.
Several landscape features also proved important to S. parksii distribution. Elevation and slope are also important to determining S. parksii occurrence. Although slope and elevation are not mechanistic variables, they often can be proxies for environmental variables, such as soil properties and plant-available water, which can drive plant distributions (Lassueur et al. 2006). Elevation, derived from digital elevation models, is the most important landscape feature for predicting S. parksii distribution. They are found at the low end of the elevation range for the area (50 -80 m). This is reflective of the specific habitat preference for margins of drainages (Ariza 2013). Similarly, S. parksii occur in areas with maximum slope ratios for the area (0.05 -0.09), which also reflects their occurrence between flatter open areas and margins of drainages.
Our model also showed soils and vegetation cover type to be important for the distribution of S. parksii. This is consistent with past studies that found high S. parksii occurrence in the Manning and Wellborn geological formations, suggesting that S. parksii might be an edaphic endemic. This is also consistent with the life cycle dependence on mycorrhizal fungi (Ariza 2013). Orchid distributions are thought to be restricted largely by interactions with pollinators and their mycorrhizal symbionts (Waterman and Bidartondo 2008). For S. parksii, pollination is likely less important (as evidenced by high levels of asexual reproduction) than fungal mutualism (Ariza 2013). Fungi tend to be patchily distributed across a landscape (Batty et al. 2001), and their distributions are driven largely by local mechanisms, especially soil properties such as soil moisture and soil organic matter (Brundrett and Abbott 1994). Ariza (2013) found higher summer soil moisture, lower pH, percent sand and abundance of soil organic matter to be the most important distinguishing characteristics between S. parksii and S. cernua occurrence. Our model suggests that S. parksii soils usually are found in areas with ,20 % clay and 55-94 % sand. The importance of soil organic matter to S. parksii likely is related to both the ability of organic matter to increase the water-holding capacity of well-drained sandy soils, and also its importance as a substrate for fungi. Orchids also tend to exhibit narrow specificity with fungi (Waterman and Bidartondo 2008;Rasmussen and Rasmussen 2009). Therefore, it is likely that S. parksii distribution closely tracks particular fungal species. The distribution of those fungi is likely driven by specific soil inputs as well as soil properties, which could explain the importance of vegetative cover (,20 % pasture, 20 -73 % evergreen, 50 -73 % deciduous) to S. parksii distribution. Fungi likely require leaf litter as a substrate for decomposition. However, a thick layer of leaf litter might inhibit germination of S. parksii seeds. This is supported by the findings of Ariza (2013) that uniform leaf litter cover was an important determinant of S. parksii occurrence. Soils have been found to be important determinants of distribution for other orchids. Bowles et al. (2005) found soil type to be the most important variable determining Plantanthera leucophaea distribution and Clark et al. (2004) determined climate, soils and vegetation type to accurately predict distributions of Cryptostylis hunteriana.
Areas with high estimated probabilities of S. parksii occurrence are distributed patchily across the range. There are some larger connected areas of high probability, but many areas with high likelihood of occurrence are punctuated with lower likelihood patches. Plants with limited dispersal shadows are highly susceptible to local extinctions due to stochastic events in fragmented habitats (Bourg et al. 2005). The distribution map generated with our model suggests fragmented habitat for S. parksii, which has limited dispersal due to tiny seeds and the necessity for mycorrhizal associations for germination. Many seedlings are found in close proximity to adult plants (Ariza 2013). Patchy distributions often are associated with limited habitat availability (Swarts and Dixon 2009). However, if there is little chance for long distance dispersal and patchily distributed areas of suitable habitat (Hurtt and Pacala 1995), movement into unoccupied areas of suitable habitat could be restricted, posing problems for species with high frequencies of local extinctions (Primack and Miao 1992). This is especially significant in areas that are highly fragmented by development and agriculture such as the central Texas range of S. parksii. Additionally, distribution shifts necessary for species continuation in the face of climate change require areas of suitable habitat be attainable for future recruitment and persistence (Carey and Brown 1994). Existence of such sites becomes increasingly less likely with high habitat specificity, limited dispersal distances and fragmented habitat. Our model possesses potential utility for understanding S. parksii meta-population dynamics more thoroughly to determine the level of threat posed to species viability in the face of increased landscape fragmentation.
The model developed in this study has potential utility beyond the scope of this work. It can be adapted to incorporate new information and data as they become available. Model accuracy increases with increased amount and accuracy of presence and absence data and can be updated to include new information to further refine distribution predictions ). Additionally, modelling multiple scales could increase the accuracy of prediction. Ecological processes function at different scales (Turner 1989;Levin 1992). Our model explores landscape level scales that drive distribution, but refining the model resolution could yield important information regarding distribution at a finer scale (i.e. within a high probability patch). Fine scale mechanisms often regulate the distribution of rare plants with specialized habitat requirements (Menges et al. 1999). For instance, Diez and Pulliam (2007) found that distance from parent was important to germination at the local scale, while soil characteristics were more predictive of germination at larger scales for Goodyera pubescencs. One opportunity for improvement of the model is to incorporate data related to disturbance and biotic interactions (e.g. distribution of pollinators or fungal associates, fire or flooding effects on habitat quality) in order to reflect the potential for non-equilibrium system functioning (Schrö der and Seppelt 2006). This could prove especially important for species such as rare orchids that have specific biotic interactions (Ettema and Wardle 2002) and tend to respond to specific disturbance regimes (Clark et al. 2004). Models incorporating landscape changes and mechanistic drivers can better capture fluctuations in habitat suitability over time (Kearney et al. 2008). This could increase the accuracy of distribution predictions in systems where habitat quality fluctuates in response to non-equilibrium processes ).

Conclusions
A suite of climatic variables and landscape features can be used to predict the distribution of the endangered terrestrial orchid, S. parksii which is endemic to central Texas. Many of these variables are related to soil resources which potentially influence the distribution of the mycorrhizal fungi the orchid depends on for germination and lifetime nutrient acquisition (Rasmussen and Rasmussen 2007;Ariza 2013). The species' potential habitat is patchily distributed as a result of this dependence on soil resources and specific habitat requirements (Batty et al. 2001). Narrow habitat specificity combined with potential dispersal limitations necessitates an integrated conservation approach that includes research to determine basic ecological and biological processes important to S. parksii population viability, habitat management and conservation and an understanding of the effects of fragmentation and habitat degradation on dispersal of S. parksii into suitable habitat (Swarts and Dixon 2009). Species distribution models can assist in the development of an integrated conservation strategy (Kiesecker et al. 2010). They can help to fill knowledge gaps resulting from limited resources for research. Similarly, they can help focus future survey and research efforts on areas with a high likelihood of occurrence (Parris 2002;Guisan et al. 2006). Species distribution models also can be used to select areas for conservation offsets or easements (Gibbons and Lindenmayer 2007;Kumar and Stohlgren 2009), explore alternate management scenarios (Guisan et al. 2013), frame research questions (Aitken et al. 2007), explore issues related to meta-population dynamics (Bourg et al. 2005) and predict potential responses to climate change (Carey and Brown 1994). The species distribution model developed through this research is adaptive. It can incorporate new information as it becomes available to improve accuracy and resolution of our analyses. Our methodology could also be employed to develop distribution maps for other rare species of conservation concern.
Sources of Funding