The GenTree Platform: growth traits and tree-level environmental data in 12 European forest tree species

Abstract Background Progress in the field of evolutionary forest ecology has been hampered by the huge challenge of phenotyping trees across their ranges in their natural environments, and the limitation in high-resolution environmental information. Findings The GenTree Platform contains phenotypic and environmental data from 4,959 trees from 12 ecologically and economically important European forest tree species: Abies alba Mill. (silver fir), Betula pendula Roth. (silver birch), Fagus sylvatica L. (European beech), Picea abies (L.) H. Karst (Norway spruce), Pinus cembra L. (Swiss stone pine), Pinus halepensis Mill. (Aleppo pine), Pinus nigra Arnold (European black pine), Pinus pinaster Aiton (maritime pine), Pinus sylvestris L. (Scots pine), Populus nigra L. (European black poplar), Taxus baccata L. (English yew), and Quercus petraea (Matt.) Liebl. (sessile oak). Phenotypic (height, diameter at breast height, crown size, bark thickness, biomass, straightness, forking, branch angle, fructification), regeneration, environmental in situ measurements (soil depth, vegetation cover, competition indices), and environmental modeling data extracted by using bilinear interpolation accounting for surrounding conditions of each tree (precipitation, temperature, insolation, drought indices) were obtained from trees in 194 sites covering the species’ geographic ranges and reflecting local environmental gradients. Conclusion The GenTree Platform is a new resource for investigating ecological and evolutionary processes in forest trees. The coherent phenotyping and environmental characterization across 12 species in their European ranges allow for a wide range of analyses from forest ecologists, conservationists, and macro-ecologists. Also, the data here presented can be linked to the GenTree Dendroecological collection, the GenTree Leaf Trait collection, and the GenTree Genomic collection presented elsewhere, which together build the largest evolutionary forest ecology data collection available.

. Phenotypic (height, diameter at breast height, crown-size, bark-thickness, biomass, straightness, forking, branch angle, fructification), regeneration, environmental in-situ measurements (soil depth, vegetation cover, competition indices), and environmental modeling data extracted by using bilinear interpolation accounting for surrounding conditions of each tree (precipitation, temperature, insolation, drought indices) were obtained from trees in 194 sites covering the species' geographic ranges and reflecting local environmental gradients.

Conclusion:
The GenTree Platform is a new resource for investigating ecological and evolutionary processes in forest trees. The coherent phenotyping and environmental characterization across 12 species in their European ranges allows for a wide range of analyses from forest ecologists, conservationists and macro-ecologists. In addition, the data here presented can be linked to the GenTree Dendroecological collection, the GenTree Leaf Trait collection, and the GenTree Genomic collection presented elsewhere, that together build the largest evolutionary forest ecology data collection available.

Conclusion: 222
The GenTree Platform is a new resource for investigating ecological and evolutionary 223 processes in forest trees. The coherent phenotyping and environmental characterization 224 across 12 species in their European ranges allow for a wide range of analyses from forest 225 ecologists, conservationists, and macro-ecologists. Also, the data here presented can be 226 linked to the GenTree Dendroecological collection, the GenTree Leaf Trait collection, and 227 the GenTree Genomic collection presented elsewhere, which together build the largest 228 The impacts of climate change and land-use change on forests are already severe, as 236 observed, for example, following the extreme summer drought of 2018 that triggered a 237 massive increase in mortality in Central European forests 1 . Furthermore, changes are 238 expected to be acute in the future, altering distribution ranges and ecosystem functioning, as 239 well as the interactions among species 2 . Forecasts indicate that near-surface temperature 240 will shift poleward at mean rates of 80-430 m yr -1 for temperate forests during the 21 st 241 century 3 . This translates into northward shifts of trees' bioclimatic envelopes from 300 to 242 800 km within one century 3 . More importantly, the frequency and intensity of drought 243 events, heat waves, forest fires, and pest outbreaks 4 are expected to increase. 244 In the light of these changes, species and forest ecosystem resilience will depend on the 245 extent and structure of phenotypic plasticity, genetic variation, and adaptive potential, as well 246 as dispersal ability. From the results of extensive networks of field experiments (provenance 247 trials), it has long been shown that tree species are locally adapted at multiple spatial scales. 248 In Europe, where most tree populations have established following post-glacial 249 recolonization, such patterns of local adaptation must have developed rapidly and despite 250 long generation time and extensive gene flow 5 , a process enabled by high levels of within-251 population plasticity, genetic and epigenetic variation, and large population sizes 6 . Recent 252 work has shown that genetic variation for stress response may be strongly structured along 253 environmental gradients, such as water availability 7 , temperature 8 , or photoperiod 9 . 254 However, the spatial patterns of current adaptation in particular phenotypic traits are only 255 partly informative regarding the potential for future adaptation under a changing climate. To 256 advance our understanding of the adaptive potential of trees, it is crucial to evaluate multiple 257 traits in parallel to be able to model their putative response to new environmental conditions. 258 Recently, substantial effort has been made to identify specific genes and gene combinations 259 that have undergone selection, by associating mutations at candidate loci with phenotypes 260 related to stress events 10,11 or with environmental variables 12 . This latter example by 261 Yeaman and co-workers 12 is one of the first association studies in forest tree species on a 262 large genomic scale and the first to investigate convergent local adaptation in distantly 263 related tree species. However, progress in this field has been hampered by limited genomic 264 resources, the lack of small-scale, individual tree-level environmental information 13 , and the 265 huge challenge of phenotyping trees in their natural environments 14,15 . 266 The GenTree Platform aims to address these challenges by providing individual level, high-267 resolution phenotypic and environmental data for a set of up to 20 sampling sites for each of 268 twelve ecologically and economically important forest tree species across Europe. For a 269 subset of seven species (B. pendula, F. sylvatica, P. abies, P. pinaster, P. sylvestris, 270 Populus nigra, and Q.petraea), the sampling of sites was carried out in pairs, i.e. contained two stands that were close enough to be connected by gene flow but situated in contrasting 272 environments. 273 The sampling design described here was used for collecting phenotypic traits and ecological 274 data. Also, tree ring and wood density measurements for the same trees were assessed 16 Table 1. 306

Sampling strategy 307
To optimize the sampling design for genome scans and association studies, we followed the 308 recent theoretical work by Lotterhos and Whitlock 19,20 , which indicates that a paired 309 sampling design has more power to detect the genomic signatures of local adaptation. Using 310 this framework, populations from across the natural range of a species are sampled in pairs, 311 with the two sites in each pair situated geographically close enough to be genetically similar 312 at neutral genes due to a common evolutionary history and ongoing gene flow, but in distinct 313 selective niches such that the local fitness optimum differs between the two sites. This 314 sampling confers more power to detect evidence of selection in the genome through either 315 association with environmental or phenotypic variables or the detection of outliers (e.g. for 316 genetic differentiation, FST) (ibid.). Trees are very amenable to a pairwise approach since 317 they are known to be locally adapted, often at fine spatial scales 21,22 and irrespective of 318 gene flow distances 6 . This strategy was followed for a subset of seven species (see above) 319 for which genomic resources were available (i.e. full or draft genome). 320 Such local niche contrasts are neither easy to identify nor readily available when 321 environments are very homogenous. Therefore, a second principle of the sampling design was to cover a large part of each species' natural geographic range (Fig. 1) and 323 environmental space (Fig. 2) to capture selective niche variation. Finally, sites with a history 324 of intensive management or any other intense and obvious anthropogenic or natural 325 disturbances were avoided. This strategy was followed for all the 12 species. 326

Selection of trees on sites 327
A minimum of 25 trees was sampled per site to capture the natural phenotypic and genetic 328 variability. Trees had to be mature but not senescent, dominant or codominant and had to 329 show no signs of significant damage due to pests and diseases or generally low vigor. 330 Sampled trees were at least 30 meters apart and, where possible, were chosen along 331 several parallel linear transects across each site, typically resulting in 2-4 transects per 332 sampling site to keep the overall sampling area below 3 hectares. 333

Site and tree metadata 334
Sites were labeled by a two-letter country code (ISO 3166-1 alpha-2) followed by a two-letter 335 species code and a two-digit site number (Table 1) system. GPS devices were also used to record the tree's elevation, either directly or through post-hoc positioning in digital elevation models. The local aspect at the site of the tree was 349 measured by a compass in five-degree steps in the direction of the steepest slope. 350 The Metadata for each site consists of an ID code (see above), sampling date, location 351 (GPS coordinates, see above), and elevation in meters above sea level (m a.s.l). Each stand 352 was also characterized as being monospecific or mixed (in the latter case the most common 353 co-occurring species was noted), stand structure was noted as single or multiple layered, 354 and the age distribution as even or uneven (categorical variables). 355

Competition index at tree level 356
Competition indices were calculated following Canham et al. 23 and Lorimer et al. 24 . 357 Specifically, the first index following Lorimer 24 was calculated as This index assumes that the net effects of neighboring trees vary as a direct function of the 364 size of the neighbors and as an inverse function of the distance. For this purpose, the 365 distance to the five nearest neighbors of each target tree was measured and their respective 366 diameter at breast height was measured. 367 Moreover, it was noted whether competitor trees were conspecific to the target tree or not. 368 Each multi-stemmed tree was considered as a single competitor where each stem larger 369 than 15 cm DBH was measured and added to the sum of means. 370

Environmental characteristics within subplots around each tree 371
Surrounding each target tree, slope, vegetation cover (without tree cover), and stone content were assessed in a 10 m x 10 m plot. The slope was assessed using a clinometer. 373 Vegetation and rock cover were estimated in the classes <5%, 5-20%, 20-40%, 40-60%, 60-374 80%, 80-95%. Soil depth was estimated at three random points in the quadrat to a maximum 375 of 60 cm with a pike and was averaged across these three values. 376

Regeneration 377
In the same 10 m x 10 m plots, natural regeneration of the target species was assessed 378 according to the following four classes: absent (no recruit visible), scattered (few/scattered 379 individuals), grouped (presence of scattered groups within the plot), and abundant (regularly 380 spread all over the plot) and is indicated in the database with values from 1-4. As this 381 method cannot resolve maternity, the results indicate realized fecundity at the stand level. 382

DBH (cm) 384
DBH was measured at a stem height of 1.3 m using either a caliper by measuring two 385 perpendicular diameters and subsequently taking the average of these two measurements 386 or by measuring the circumference of the tree using a tape and computing the diameter from 387 that value. Each measurement was performed to the nearest 0.1 cm. If a tree had more than 388 one trunk, all of them were measured and the average was recorded. 389

Height (m) 390
Height from the ground to the top of the crown was measured using a hypsometer (Nikon 391 forestry Pro Laser), a laser vertex (Haglof Vertex III, Langsele, Sweden), or a Laser Range 392 Meter (Bosch GLM 50 C, Leinfelden-Echterdingen, Germany). For short trees, a telescopic 393 measuring pole was used. Height was noted to the nearest 0.1 m. To forego errors 394 introduced by measuring height on sloping ground, height measurements on slopes were 395 conducted from the same elevation as the tree's base by approaching the tree sideways. 396 Where this was not possible, a slope correction factor was used.

Crown size (m²) 398
The crown size was measured as the circular and ellipsoid plane area of the crown. For this, 399 we measured two perpendicular crown diameters (canopy 1 and 2) by using a measurement 400 tape, with the first measurement being made along the longest axis of the crown, from one 401 edge to the other, and by visually projecting the crown margin onto the ground to the nearest 402 decimeter. For the ellipse area, we calculated

Number of fruits (units) 413
In conifers, cones were counted by providing the average of three rounds of counting, made 414 by an observer on the ground using binoculars. Only mature (brown) and closed cones were 415 counted, i.e., those containing seeds, and not immature (green) or open cones, whose 416 seeds had already been dispersed (open cones often stay on the branch for several years 417 after seeds are dispersed). In broadleaves, the number of fruits was counted for 30 seconds, 418 repeating the procedure three times to then average the three counts. 419 In the case of species with very small fruits that are hard to see individually and in locations 420 with a very limited view of the canopy, each tree was assigned to one of five categories, 421 namely 0 (no fruits), 1 (a few fruits in a small section of the crown), 2 (a few fruits in two or more sections of the crown), 3 (a lot of fruits in a small section of the crown), and 4 (a lot of 423 fruits in two or more sections of the crown). 424

Straightness 425
Straightness of the stem was classified according to five levels: (1)  Modeled environmental data extracted for GenTree sites 443 Topography, soil, and climate data were compiled to characterize environmental conditions 444 in each GenTree sampling site as follows.

Topography 446
We used the European digital elevation model to describe topographic conditions at 25 m 447 spatial resolution with a vertical accuracy of about ± 7 meters (EU-DEM v. 1.1 from the 448 Copernicus program; https://land.copernicus.eu/). We derived 14 variables (Table 2) based 449 on biological hypotheses and their informative power at the local scale 25 . We calculated 450 morphometric, hydrologic, and radiation grids for each GenTree site and visually inspected 451 data integrity using SAGA 6.2 26 (details in Table 2). 452

Soil 453
We collected available data on water capacity at seven soil depths using SoilGrids250m 27 . 454 We estimated Pearson's correlation coefficients, r, between soil layers and then averaged 455 the four first superficial (0, 5, 15, and 30 cm) and the three deeper (60, 100, and 200 cm) 456 layers that were highly correlated, respectively. 457

Climate 458
We extracted climate data with a high spatial resolution (30 arcsec) using CHELSA v. 1.   The local environmental contrasts varied among species and population pairs, most of which 474 exhibited variability concerning elevation, temperature, precipitation, and water availability. 475 Other local contrasts were based on radiation, soil water capacity, and topographic wetness 476 index (among others). One special case is Populus nigra, a heliophilous pioneer species 477 found naturally in riverine areas. Given this specific habitat, local contrasts were largely 478 bound to the distance of the individual trees from the riverbed and thus for example to 479 groundwater access or exposure to variation in the intensity and frequency of floods. 480