Recent range shifts of moths, butterflies, and birds are driven by the breadth of their climatic niche

Abstract Species are altering their ranges as a response to climate change, but the magnitude and direction of observed range shifts vary considerably among species. The ability to persist in current areas and colonize new areas plays a crucial role in determining which species will thrive and which decline as climate change progresses. Several studies have sought to identify characteristics, such as morphological and life-history traits, that could explain differences in the capability of species to shift their ranges together with a changing climate. These characteristics have explained variation in range shifts only sporadically, thus offering an uncertain tool for discerning responses among species. As long-term selection to past climates have shaped species’ tolerances, metrics describing species’ contemporary climatic niches may provide an alternative means for understanding responses to on-going climate change. Species that occur in a broader range of climatic conditions may hold greater tolerance to climatic variability and could therefore more readily maintain their historical ranges, while species with more narrow tolerances may only persist if they are able to shift in space to track their climatic niche. Here, we provide a first-filter test of the effect of climatic niche dimensions on shifts in the leading range edges in three relatively well-dispersing species groups. Based on the realized changes in the northern range edges of 383 moth, butterfly, and bird species across a boreal 1,100 km latitudinal gradient over c. 20 years, we show that while most morphological or life-history traits were not strongly connected with range shifts, moths and birds occupying a narrower thermal niche and butterflies occupying a broader moisture niche across their European distribution show stronger shifts towards the north. Our results indicate that the climatic niche may be important for predicting responses under climate change and as such warrants further investigation of potential mechanistic underpinnings.


Figures S1-S8
Tables S1-S6 Text S1.Distribution data and delimitation We started off with a selection of 244 moth and 91 butterfly species.These moths and 45 of the butterfly species were selected for a previous study on phenology and range shifts based on the availability of adequate numbers of systematically collected monitoring and trap dates (Hällfors et al., 2021).Although we do not use those moth trap data in this study, we opted for using the same moth species here as in Hällfors et al. (2021) allowing direct comparisons between the studies, and as this set covers the most common and abundantly occurring species in Finland.The lepidoptera species in this study cover almost 80% of the butterfly species ever observed in Finland and circa 27% of moths commonly monitored in Finland.We excluded five butterfly species and one moth species that are migratory and do not have a permanent breeding population in Finland (Pieris rapae, P. brassicae, Vanessa atalanta, V. cardui, Colias hyale, and Autographa gamma), leaving us with 243 moth and 86 butterfly species at this stage.For these species, we sourced observations that were available in the Insect database and National Butterfly Monitoring Scheme (NAFI; Saarinen et al., 2003), through the Finnish Biodiversity Information Facility (FinBIF) in December 2019 (moths) and June 2022 (butterflies) (Text S2).The data were divided into two five-year periods: 1992-1996 (hereafter T1) and 2013-2017 (hereafter T2) and converted into presence-only data for each 10 x 10 km grid square (119 621 observations).We excluded two moth and 23 butterfly species that had been observed in less than 20 grid cells in either one of the time periods (117 486 observations).The total number of presence squares at T2 was substantially higher than at T1 due to increased sampling effort over time.To account for the change in sampling effort, we divided the data into five latitudinal zones (Fig. S1) and randomly subsampled the pooled observations of all species in T2 so that the number of observations in T2 matched the number of observations in T1 within the latitudinal zone.This was repeated five times leaving us with five subsets with a total of 78 380 observations in each for these 241 moth and 63 butterfly species.For a comparison of subsamples across the latitudinal zones, see Fig. S2.
For birds, we used distribution data on terrestrial breeding birds, sourced from three national bird atlases through the Finnish Natural History Museum (Text S2).These atlases have been compiled from national bird surveys carried out during 1974-1979, 1986-1989 and 2006-2010, respectively.The three atlases contain an index of breeding probability (ranging from 0 = not found; to 4 = confirmed breeding) for each bird species on a 10 x 10 km uniform grid that covers the entire area of Finland (3813 grid squares; (Väisänen et al., 1998)).We took breeding probability of 0 (not found) to represent absence and all other classes to represent presences.As the third atlas has been surveyed more extensively in comparison to first two atlases, and following earlier practices, we used the pooled first and second atlas data, covering the time period 1974-1989 (hereafter T1) and compared this to the third (2006)(2007)(2008)(2009)(2010) atlas (hereafter T2) to reduce potential observation biases due to differences in survey effort (Kujala et al. 2013;Virkkala & Lehikoinen, 2017).We excluded all waterfowl and species that had been observed breeding in less than 20 grid cells in either one of the time periods leaving us with 177 bird species.
Finally, we harmonized the distribution extents of the study species and included only species that likely have their northern range borders in Finland.In Finland, moths, butterflies, and breeding birds include several arctic and subarctic species that predominantly occupy the northern parts of the country and for which the leading distribution edge is close to or outside the country borders.Consequently, we removed two moth, six butterfly and 90 bird species for which the centre point of distribution in Finland in T1 was ≥ 7 000 000 north in the Finnish uniform coordination system (63º4' N in degrees; range in Finland 59º46' -70º5'N; Fig. S1) (Brommer, 2004;Brommer et al., 2012;Kujala et al., 2013).For birds, the centre point was weighted by their breeding category in T1.This allowed us to focus on the predominantly southern species with leading distribution edges in Finland.The consequent data used for the main analyses thus consisted of 383 species: 239 species of moths, 57 species of butterflies, and 87 species of birds for which the shift in the northern facets of distribution between two periods of time 17 years apart were measured.

Text S3. Data used to arrive at climatic niche metrics.
For butterflies, we employed metrics provided by Schweiger et al. (2014), based on the distribution atlas on butterflies in Europe.For birds, we used distribution data from BirdLife International (BirdLife International, 2020).These data are based on a variety of sources and give different levels of site occupancy.We included the location where each species is extant or probably extant, native or reintroduced, and known or thought very likely to be resident or occur during the breeding season.For moths we used available digitized atlas information on geometric moths from Heidrich et al., (2018) which are based on The Geometric Moths of Europe volumes 1-4 (Hausmann, 2001(Hausmann, , 2004;;Mironov, 2003;Hausmann & Viidalepp, 2012;Hausmann et al., 2012.) and digitized atlas maps on 176 other species based on printed atlases (Fibiger 1990, 1993, 1997, 2009, Fibiger et al. 1995, 2007, 2010;de Freina & Witt, 1987, 1990;Gouter et al. 2003;Hacker et al. 2002;Müller et al. 2019;Ronkay et al 1994Ronkay et al , 2001;;Skou & Sihvonen, 2015;Zilli et al. 2005).The atlas data were overlaid on the CGRS grid (Common European Chorological Grid Reference System from the European Environment Agency).We used interpolated climate data on the same CGRS grid (originally developed in the ALARM project (Settele et al., 2005;Fronzek et al., 2012) and parameters summarized by Schweiger et al. (2014)) to calculate climatic niche metrics for each species.Interrelationship of climatic niche metrics and four examples of atlas data in Fig. S3.

Text S4. Data sources for traits, habitat, and range size
Trait data on birds were based on Solonen (1985) and Cramp et al. (1994) while data on Lepidoptera were collated from several sources (Mikkola & Jalas, 1977, 1979;Mikkola et al., 1985Mikkola et al., , 1989;;Marttila et al., 1991Marttila et al., , 1996;;Jalas, 1992;Silvonen et al., 2014;Middleton-Weilling et al. 2020).Size was measured as the total wingspan (in mm) of females for moths, as wing index for butterflies, and as mean mass for birds.Bird species that are multi-brooded in southern Finland and single-brooded in northern Finland were categorized as having two or more broods.Voltinism in Lepidoptera were combined into two levels: Semiand univoltine species = one or less, and multivoltine species (including bivoltine species) = two or more (Pöyry et al., 2017).See Table S1 for number of species in categorical traits groups.From the distributional data used for calculating climatic niche metrics, we also derived another ecological attribute: range size.

Text S5. Alternative models.
In addition to the main model testing the effect of mean and SD of MAT and mean and SD of SWC on a shift in the 0.9 quantile, we tested these three alternative models on each taxonomic group to assess the robustness of our results particularly in relation to alternative metrics for or range shift, niche breadth, and the thermal and moisture niche.These alternative models were: 1) the effect of mean and SD of MAT and SWC and traits on the 0.75 quantile (i.e. the same model as the main model, but for shift in the 0.75 quantile as opposed to the 0.9 used in the main model), 2) the effect of mean and CV (relative niche breadth, as opposed to SD which describes the absolute niche) of MAT and SWC and traits on the 0.9 quantile, and 3) the effect of mean and SD of GDD and PREC and traits instead of MAT and SWC on the 0.9 quantile.

Results of alternative models
When comparing the main model (using 0.9 as range shift, SD to describe niche breadth, and MAT and SWC to describe the thermal and moisture niche, respectively) to the three alternative models we found mostly similar results but also some differences.
For moths, the best alternative models were identical to the main model (Tables S2a, S3a, S4a, and S5a), i.e., the best model all contained mean thermal niche, thermal niche breadth, and wintering.The direction and relative size of the estimates as well as added information value of the variables were also the same (Table S6a).
For birds, the best alternative models 1 and 3 were identical to the main model (Tables S2b, S3b, S4b, and S5b), i.e., they all contained the same variables thermal niche breadth, wintering and number of brood.Alternative model 2 for birds, however, contained only thermal niche breadth (Table S3b).The direction of the variable estimates was the same in all models, but thermal niche breadth did not improve model fit in alternative model 2 (Table S6).In the main model and alternative model 1 and 3 for birds, the direction and relative size of the estimates as well ads the added information value for the variables were similar.
For butterflies, only alternative model 3 produced the same qualitative and similar quantitative result, with the moisture niche breadth having a positive effect on range shifts (Tables S2c, S3c, S4c, S5c, and S6c).In alternative models 2 and 3 mean moisture niche was the only variable selected for the best explaining model selected (Tables S3c and S5c) with a wetter niche indicating larger range shifts, but only significantly so for alternative model 2.

Model performance of alternative models
We assessed model performance (of the final model) by both visualizing and testing normality and heteroscedasticity of model residuals, collinearity (variance inflation factors), and influential outliers using the performance package in R (Lüdecke et al. 2021.We used these plots together with formal tests implemented in the performance package to evaluate model appropriateness. For testing normality of residuals, we used the check_normality function in the performance package uses the Shapiro test where a p-value of 0.05 indicates deviation from normal distribution.For our alternative models 1 the normality test resulted in p-values below 0.001 for all species groups.Although the package developer notes that "this formal test almost always yields significant results for the distribution of residuals and visual inspection (e.g.Q-Q plots) are preferable", the visual inspection of residual normality (not shown) also indicate non-normality of residuals, wherefore we cannot trust that these models do violate the normality assumption of a linear model.Also, no transformation (log, cube, scale, Yeo-Johnson) of the dependent variables improved this metric.Therefore, we caution against definitive interpretations based on the alternative model 1.For alternative models 2 and 3 the normality test resulted in p-values of 0.040 and 0.043, 0.042 and 0.040, and 0.475 and 0.465 for the moth, bird, and butterfly alternative models 2 and 3, respectively.Since the residuals of the moth and bird models upon visual inspection were relatively well distributed along the qq-line and normal density distribution (not shown) and the p-values are close to 0.05, we consider the models not to violate the normality assumption of a linear model.
For heteroscedasticity tests, we employed the check_ heteroscedasticity function in the performance package which uses the Breush-Pagan test where a p-value < 0.05 indicates heteroscedasticity.For alternative models 1, this test resulted in p-values of 0.112, 0.545, and <0.001 for the moth, bird, and butterfly models respectively.Therefore, we refitted the butterfly model using a linear regression that implements robust standard errors, namely the lm_robust function as implemented in the estimatr package (Blair et al. 2022).This approach adjusts standard errors to account for heteroscedasticity and is thus a more conservative approach in cases where the residuals show uneven patterns (Astivia & Zumbo 2019).The heteroscedasticity test resulted in p-values of 0.107 and 0.284, 0.089 and 0.397, and 0.893 and 0.667 for the moth, bird, and butterfly alternative models 2 and 3, respectively, indicating no heteroscedasticity in these models.
To test for collinearity among model variables we used the check_collinearity function in the performance package.The VIF values ranged between 1.05-1.10for the moth alternative model 1.The bird and butterfly alternative models 1 contained only one variable, wherefore a collinearity test was not relevant.VIF values ranged between 1.10-1.13and 1. 07-1.71, 1.19-1.23 and 1.24-1.26for the moth and bird alternative models 2 and 3, respectively.The butterfly alternative models 2 and 3 contained only one variable, wherefore a collinearity test was not relevant.Therefore, we can conclude that there were no multicollinearity issues in any of the alternative models.We used the check_outliers as implemented in the performance package to check for influential outliers.No outliers were detected in the bird and moth models, but there was one outlier (case 41) detected in the alternative model 2 and 3 for butterflies.

Discussion on alternative models
The alternative models resulted in qualitatively similar results (see Text S5).Using change in the 0.75 quantile to measure range shifts (alt.model 1) resulted in differences for the bird (only thermal niche breadth had explanatory power, but not migratory behavior and number of broods) and butterfly model (mean moisture niche had explanatory power as opposed to moisture niche breadth).This indicates that, at this scale of distribution shifts, the same mechanism cannot be identified to relate with range expansions or shifts as compared to the 0.9 quantile that better mirrors the leading range edge.
Standard deviation (a proxy for absolute niche breadth) and coefficient of variation (a proxy for relative niche breadth; alt.model 2) showed, overall, a similar effect on range shifts for all studies species groups, thus pointing to a lack of difference in the effect of relative versus absolute niche breadth at least at this scale.Using other climatic metrics to describe the thermal and moisture niche (alt.model 3) indicated that the mean moisture niche had explanatory power as opposed to moisture niche breadth for butterflies but there was weak statistical support for this variable having any connection to range shifts overall.

Figure S1. Latitudinal areas used for equalizing presence data on Lepidoptera within between the two periods.
The total number of presence central distribution point squares for T2 was substantially higher than for T1 due to overall increased sampling effort over time.To avoid effects caused by differences in overall observation intensity, we divided the data into these five latitudinal zones and randomly subsampled the pooled observations of all species at T2 across the 304 Lepidoptera species that had at least 20 occupied grid cells per period, so that the total number of observations matched the number of observations at T1 within each latitudinal zone.The number of observations at T1 versus T2 shown in the grey boxes alongside the map.Latitudes and longitudes in projection format ETRS-TM35FIN, EPSG:3067.Each panel represents a latitudinal zone (Fig. S1).Different colors represent the five different subsamples and each point in a subsample represents one out of the 304 Lepidoptera species.b) Percent of points included in the subsampled compared to the number of points in the raw data.Horizontal jitter has been introduced to points in a) and b) to increase interpretability of overlapping values.c) Number of subsampled points in subsamples 2-5 compared to subsample 1 across all subsampling zones.Identity line (solid black line) added to all plots to ease interpretability.Overall, the absolute difference between raw and sampled data, as well as between samples, was greatest for species with many distribution points (a and c) and this difference was larger in zones where the total number of points was smaller (a).The percentage of included distribution points was most variable for species with low numbers of distribution points (b).Our approach to subsample within latitudinal zones removes variation caused by changes in effort that vary across space.Our approach using five-fold subsampling, analyzing each sample independently with quantitative regression and then averaging the estimates, reduces sampling-based stochasticity and imprecision and therefore results in a more conservative overall estimate.

Figure S3. Relationship between mean and standard deviation for climate niche metrics (a-d) and examples of data used to derive these (e-h).
The metrics shown in panels (a-d) were used to calculate the coefficient of variation (cv), for a) mean annual temperature (MAT) b) growing degree days above 5°C between January-August (GDD), c) annual precipitation sum (PREC), d) soil water content (SWC).Circle size gives CV for each species.Colored points show the metric values of the four example species in (e-h) on a backdrop of all species in this study.Panels (e-h) show examples atlas data used to calculate the climatic niche metrics moth species: e) Gymnoscelis rufifasciata with a large range and relatively southern 0.9 quantile at T1, f) Idaea emarginata with a small range and relatively southern 0.9 quantile at T1 g) Plemyria rubiginata with a large range and relatively northern 0.9 quantile at T1, and h) Venusia cambrica with a small range and relatively northern 0.9 quantile at T1.The atlas data for these moth species are published in Heidrich et al. (2018).Climatic niche metrics and range size were derived from atlas data (Text S3) while the position of the 0.9 quantile in T1 was estimated based on distribution data (Texts S1 and S2).
Reference: Heidrich, L., Friess, N., Fiedler, K., Brändle, M., Hausmann, A., Brandl, R., et al. (2018) The dark side of Lepidoptera: Colour lightness of geometrid moths decreases with increasing latitude.Global Ecology and Biogeography, 27, 407-416.Thus, we see some indications that species with a lower northern range edge in T1 tended to move further north.Rapoport's rule (Stevens, 1989) states that species at higher latitudes have tend to have broader ranges (and thus potentially broader niches) than species occurring closer to the equator.Yet, among our study species, those species with warmer thermal niches tended to also have broader niches (Fig. S3).At first glance, our data thus seems not to support Rapoport's rule.However, this effect is likely caused by the delimitation of our study area.Namely, we only examined species occurring in Finland but measured their climatic niches from across the whole of Europe.Because of this, the "warmer" the species is, the larger its range needs to be within Europe in order to both occur within the borders of Finland and to occupy warmer conditions.Therefore, the niche breadth, which would be narrower for species at lower (warmer) latitudes, is potentially a more proximate explanation for why we observe warmer species moving to higher latitudes at higher rates than species inhabiting colder areas.S8 show residual normality plots (a-f) and variance inflation plots (g-i) for the main models for moths (a, d, and g), birds (b, e, and h), and butterflies (c, f, and i).We used these together with formal tests implemented in the performance package to evaluate model appropriateness.For testing normality of residuals the check_normality function in the performance package uses the Shapiro test where a p-value of 0.05 indicates deviation from normal distribution.For our main models the test resulted in p-values of 0.040, 0.043, and 0.430 for the moth, bird, and butterfly model, respectively.The package developer, however, note that "this formal test almost always yields significant results for the distribution of residuals and visual inspection (e.g.Q-Q plots) are preferable".Since the residuals of the moth and bird models upon visual inspection are relatively well distributed along the qq-line and normal density distribution and the p-values are close to 0.05, we consider the models not to violate the normality assumption of a linear model.Also, no transformation (log, cube, scale, Yeo-Johnson) of the dependent variables improved this metric.For heteroscedasticity tests, we employed the check_ heteroscedasticity function in the performance package which uses the Breush-Pagan test where a p-value < 0.05 indicates heteroscedasticity.For our main full models, this test resulted in p-values of 0.107, 0.104, and 0.964 for the moth, bird, and butterfly models respectively.To test for collinearity among model variables we used the check_collinearity function in the performance package.The VIF values ranged between 1.10-1.14, and 1.19-1.23 for the moth and bird models, respectively.The butterfly model contained only one variable, wherefore a collinearity test was not relevant.Therefore, we can conclude that there were no multicollinearity issues in our full models.We used the check_outliers as implemented in the performance package to check for influential outliers and it revealed that no outliers were detected in the moth or bird models, but one outlier (case 41) was identified as an outlier with a cook value of 0.7 in the butterfly model.

References:
Astivia, Oscar L. Olvera and Zumbo, Bruno D. ( 2019 Table S3.Results of the best subsets regression of the alternative model 1 comparing all possible combinations of our eight explanatory variables (2 8 = 256 potential models).This method provides a list of the best fitting models for models with one, two, three …8 variables, based on a balance of several criteria like R 2 and AIC.The produced list of best fitting models includes a diverse range of such model fit metrics for the user to use for further evaluations.AIC: Akaike Information Criteria; SBC: Schwarz Bayesian Criteria; MSEP: Estimated error of prediction, assuming normality; FPE: Final Prediction Error; HSP: Hocking's Sp; APC: Amemiya Prediction Criteria.From this list we chose the best explaining model based on the lowest AIC value (Akaike's Information Criterionwhile also considering increase in adjusted and predicted R 2 .However, in order to choose a less parsimonious model (a model with more variables) the AIC needed to be at least 2 units lower since models with AIC values differing with less than 2 units can be considered to have the same information value.With this we sought to identify a minimal model that fits the data best and thereby identify the variables that help explain observed range shifts.Alternative model 3 where we used the GDD instead of MAT to describe the thermal niche and PREC instead of SWC to describe the moisture niche.

Figure S2 .
Figure S2.Number of distribution points of Lepidoptera included in the subsample for T2 compared to the raw number of distribution points in T2 shown for each latitudinal zone.a) Number of subsampled points compared to the number of points in the raw data with a fitted GAM line across each subsample.Each panel represents a latitudinal zone (Fig.S1).Different colors represent the five different subsamples and each point in a subsample represents one out of the 304 Lepidoptera species.b) Percent of points included in the subsampled compared to the number of points in the raw data.Horizontal jitter has been introduced to points in a) and b) to increase interpretability of overlapping values.c) Number of subsampled points in subsamples 2-5 compared to subsample 1 across all subsampling zones.Identity line (solid black line) added to all plots to ease interpretability.Overall, the absolute difference between raw and sampled data, as well as between samples, was greatest for species with many distribution points (a and c) and this difference was larger in zones where the total number of points was smaller (a).The percentage of included distribution points was most variable for species with low numbers of distribution points (b).Our approach to subsample within latitudinal zones removes variation caused by changes in effort that vary across space.Our approach using five-fold subsampling, analyzing each sample independently with quantitative regression and then averaging the estimates, reduces sampling-based stochasticity and imprecision and therefore results in a more conservative overall estimate.

Figure S4 .
Figure S4.Shift in range based on metrics describing the range position and extent of species.a) 0.75 and 0.9 quantiles of the distribution along latitudinal gradient and across taxonomic groups.N= 383 species (Moths= 239; Butterflies= 57; Birds= 87).Grey lines = CI for the estimated difference in quantile between T1 and T2 includes zero; orange and purple CI for a positive (yellow) or negative (purple) estimated difference in quantile between T1 and T2 does not include 0. Latitudes in projection KKJ / Finland Uniform Coordinate System, EPSG:2393.Univariate plots for the effect of b) mean thermal niche, c) breadth of thermal niche, and d) latitudinal position of range edge at T1 on shift in 0.9 quantile between T1 and T2 (range shift; latitudes in projection KKJ / Finland Uniform Coordinate System, EPSG:2393.The fitted lines in b)-d) are linear regressions based on a univariate linear models and points are raw data.MAT= Mean annual temperature.Model comparison of univariate models b)-d) using AIC for moths indicated that latitudinal range edge position at T1 (AIC= 6239.54)explained the data better then mean thermal niche (AIC=6380.90)or thermal niche breadth (AIC=6375.11).For birds, thermal niche breadth (AIC= 977.59) and latitudinal range edge position at T1 (978.15) had similar explanatory power and both explained the data slightly better than mean thermal niche (AIC =981.42).For butterflies, the thermal niche breadth (AIC = 3448.89)explained the data better than did mean thermal niche (AIC = 3450.64)or latitudinal position at T1 (AIC = 3451.27).
.C.(1989)  The Latitudinal Gradient in Geographical Range: How so Many Species Coexist in the Tropics.The American Naturalist, 133.

Figure S5 .
Figure S5.Shift in 0.75 (left-hand side) and 0.9 quantiles (right-hand side) of the distribution along the latitudinal gradient per taxonomic group.Birds= 87; Butterflies= 57; Moths= 239.Grey lines = CI for the estimated difference in quantile between T1 and T2 includes zero; orange and purple CI for a positive (yellow) or negative (purple) estimated difference in quantile between T1 and T2 does not include 0. Latitude in projection projection KKJ / Finland Uniform Coordinate System, EPSG:2393.

Figure S6 .
Figure S6.Correlation matrix of all explanatory variables.Created using the pairs.panelfunction in the psych package (Revelle 2022).MAT= Mean annual temperature; GDD= Growing degree days above 5 until august; PREC= annual precipitation sum; SWC= Soil water content; SD= Standard deviation; CV= Coefficient of variation.

Fig. S7 .
Fig. S7.Univariate plots with for all explanatory variables included in the initial model, from which the best explaining variables were selected using model selection.In a)-h) fitted lines are linear regressions based on univariate linear models and points are raw data.MAT= Mean annual temperature; SWC= Soil water content; SD= Standard deviation; CV= Coefficient of variation.

Figure S8 .
Figure S8.Model performance of main final models (presented in the main text).We assessed model performance (of the final model) by both visualizing and testing normality and heteroscedasticity of model residuals, collinearity (variance inflation factors), and influential outliers using the performance package in R (Lüdecke et al. 2021.FigureS8show residual normality plots (a-f) and variance inflation plots (g-i) for the main models for moths (a, d, and g), birds (b, e, and h), and butterflies (c, f, and i).We used these together with formal tests implemented in the performance package to evaluate model appropriateness.For testing normality of residuals the check_normality function in the performance package uses the Shapiro test where a p-value of 0.05 indicates deviation from normal distribution.For our main models the test resulted in p-values of 0.040, 0.043, and 0.430 for the moth, bird, and butterfly model, respectively.The package developer, however, note that "this formal test almost always yields significant results for the distribution of residuals and visual inspection (e.g.Q-Q plots) are preferable".Since the residuals of the moth and bird models upon visual inspection are relatively well distributed along the qq-line and normal density distribution and the p-values are close to 0.05, we consider the models not to violate the normality assumption of a linear model.Also, no transformation (log, cube, scale, Yeo-Johnson) of the dependent Figure S8.Model performance of main final models (presented in the main text).We assessed model performance (of the final model) by both visualizing and testing normality and heteroscedasticity of model residuals, collinearity (variance inflation factors), and influential outliers using the performance package in R (Lüdecke et al. 2021.FigureS8show residual normality plots (a-f) and variance inflation plots (g-i) for the main models for moths (a, d, and g), birds (b, e, and h), and butterflies (c, f, and i).We used these together with formal tests implemented in the performance package to evaluate model appropriateness.For testing normality of residuals the check_normality function in the performance package uses the Shapiro test where a p-value of 0.05 indicates deviation from normal distribution.For our main models the test resulted in p-values of 0.040, 0.043, and 0.430 for the moth, bird, and butterfly model, respectively.The package developer, however, note that "this formal test almost always yields significant results for the distribution of residuals and visual inspection (e.g.Q-Q plots) are preferable".Since the residuals of the moth and bird models upon visual inspection are relatively well distributed along the qq-line and normal density distribution and the p-values are close to 0.05, we consider the models not to violate the normality assumption of a linear model.Also, no transformation (log, cube, scale, Yeo-Johnson) of the dependent

Table S2 . Results of the best subsets regression of the main model comparing all possible combinations of our eight explanatory variables (2 8 = 256 potential models).
This method provides a list of the best fitting models for models with one, two, three …8 variables, based on a balance of several criteria like R 2 and AIC.The produced list of best fitting models includes a diverse range of such model fit metrics for the user to use for further evaluations.AIC: Akaike Information Criteria; SBC: Schwarz Bayesian Criteria; MSEP: Estimated error of prediction, assuming normality; FPE: Final Prediction Error; HSP: Hocking's Sp; APC: Amemiya Prediction Criteria.From this list we chose the best explaining model based on the lowest AIC value (Akaike's Information Criterionwhile also considering increase in adjusted and predicted R 2 .However, in order to choose a less parsimonious model (a model with more variables) the AIC needed to be at least 2 units lower since models with AIC values differing with less than 2 units can be considered to have the same information value.With this we sought to identify a minimal model that fits the data best and thereby identify the variables that help explain observed range shifts.

where we used the 0.75 quantile of distribution points to measure shift in northern range edge Table S4. Results of the best subsets regression of the alternative model 2 comparing all possible combinations of our eight explanatory variables (2 8 = 256 potential models).
This method provides a list of the best fitting models for models with one, two, three …8 variables, based on a balance of several criteria like R 2 and AIC.The produced list of best fitting models includes a diverse range of such model fit metrics for the user to use for further evaluations.AIC: Akaike Information Criteria; SBC: Schwarz Bayesian Criteria; MSEP: Estimated error of prediction, assuming normality; FPE: Final Prediction Error; HSP: Hocking's Sp; APC: Amemiya Prediction Criteria.From this list we chose the best explaining model based on the lowest AIC value (Akaike's Information Criterionwhile also considering increase in adjusted and predicted R 2 .However, in order to choose a less parsimonious model (a model with more variables) the AIC needed to be at least 2 units lower since models with AIC values differing with less than 2 units can be considered to have the same information value.With this we sought to identify a minimal model that fits the data best and thereby identify the variables that help explain observed range shifts.Tmean = mean thermal niche; Tbreadth = breadth of thermal niche; Mmean = mean moisture niche; Mbreadth = breadth of moisture niche; Wintering = overwintering mode; Numb.Gen. = number of generations or broods per season; Size = body size; Range size = range size across Europe.

where we used the CV (realtive niche breadth) instead of SD (absolute niche breadth) to measure niche breadth Table S5. Results of the best subsets regression of the alternative model 3 comparing all possible combinations of our eight explanatory variables (2 8 = 256 potential models).
This method provides a list of the best fitting models for models with one, two, three …8 variables, based on a balance of several criteria like R 2 and AIC.The produced list of best fitting models includes a diverse range of such model fit metrics for the user to use for further evaluations.AIC: Akaike Information Criteria; SBC: Schwarz Bayesian Criteria; MSEP: Estimated error of prediction, assuming normality; FPE: Final Prediction Error; HSP: Hocking's Sp; APC: Amemiya Prediction Criteria.From this list we chose the best explaining model based on the lowest AIC value (Akaike's Information Criterionwhile also considering increase in adjusted and predicted R 2 .However, in order to choose a less parsimonious model (a model with more variables) the AIC needed to be at least 2 units lower since models with AIC values differing with less than 2 units can be considered to have the same information value.With this we sought to identify a minimal model that fits the data best and thereby identify the variables that help explain observed range shifts.Tmean = mean thermal niche; Tbreadth = breadth of thermal niche; Mmean = mean moisture niche; Mbreadth = breadth of moisture niche; Wintering = overwintering mode; Numb.Gen. = number of generations or broods per season; Size = body size; Range size = range size across Europe.