Phylogenetic patterns suggest frequent multiple origins of secondary metabolites across the seed-plant ‘tree of life’

Abstract To evaluate the phylogenetic patterns of the distribution and evolution of plant secondary metabolites (PSMs), we selected 8 classes of PSMs and mapped them onto an updated phylogenetic tree including 437 families of seed plants. A significant phylogenetic signal was detected in 17 of the 18 tested seed-plant clades for at least 1 of the 8 PSM classes using the D statistic. The phylogenetic signal, nevertheless, indicated weak clustering of PSMs compared to a random distribution across all seed plants. The observed signal suggests strong diversifying selection during seed-plant evolution and/or relatively weak evolutionary constraints on the evolution of PSMs. In the survey of the current phylogenetic distributions of PSMs, we found that multiple origins of PSM biosynthesis due to external selective forces for diverse genetic pathways may have played important roles. In contrast, a single origin of PSMs seems rather uncommon. The distribution patterns for PSMs observed in this study may also be useful in the search for natural compounds for medicinal purposes.


INTRODUCTION
Plant secondary metabolites (PSMs) are ubiquitous in plants [1]. They play diverse ecological and physiological roles in defense against herbivores, pathogenic microbes and competing plants [2,3]. They also mediate interactions with pollinators, mycorrhizal fungi and other plants, and confer protection against abiotic stressors such as ultraviolet (UV) radiation, frost and drought [4]. PSMs are important to humans in many ways, representing a major source of medicinal drugs. Many approved drugs, and those currently in clinical trials, are derived from natural products, including 25% being plant-derived [5]. PSMs are characterized by great structural diversity, with >200 000 compounds already identified and many more remaining to be discovered [1].
Despite the importance of PSMs, the patterns and processes of their diversification across seed plants have not yet been thoroughly investigated [2]. The establishment of a clear pattern of phylogenetic diversification is necessary to meaningfully understand the distribution of PSMs [6]. Using comparative phylogenetic approaches and mapping of chemical characteristics, we can identify lineages producing similar PSMs and thus gain an improved understanding of the evolution of secondary metabolic traits. Of practical importance, plants used for the treatment of ailments are often significantly clustered phylogenetically [7]; closely related plants often harbor similar chemical substances and bioactivities [8]. The understanding associated with such findings could assist in the process of engineering biosynthetic pathways to obtain new biologically active and pharmaceutically relevant compounds.
The continuous development of more wellsupported plant phylogenies, improved knowledge of plant chemistry and the application of advanced statistical tools and methods have allowed us to resolve some of the patterns in PSM distribution within and between different plant clades. Over the past decade, improvements in DNA sequencing and analytical methods have revolutionized our understanding of seed-plant phylogeny [9]. These advances have allowed us to investigate large-scale patterns of trait evolution in plants, including those involving PSMs, with ever-increasing precision.
A first step in understanding PSM evolution in a phylogenetic context is to test for phylogenetic signals in various PSMs. A phylogenetic signal is the tendency of related species to resemble each other with respect to evolutionarily related traits more than would be expected for species drawn at random from a phylogenetic tree [10], indicating a relationship between the degree of phylogenetic relatedness and phenotypic similarity [11]. Measures of phylogenetic signal are widely employed in ecological and evolutionary studies in many organisms spanning a wide range of traits such as, but not limited to, ecophysiological traits, growth form, habitat and life history [12]. A significant phylogenetic signal can be maintained when closely related species respond to similar ecological pressures, maintaining their adaptive features regardless of whether the traits evolved once in a common ancestor or convergently in the course of evolution. Past studies have often shown the widespread sharing of particular PSMs among closely related plant species [13]. For example, root secondary compounds, especially phenolics, display a significant phylogenetic signal in species of Eucalyptus (Myrtaceae) [13]. Studying various members of such groups can often reveal variation in their characteristic PSM groups [14]. Similar investigations of families and higher taxonomic groups have also been fruitful, but comparisons of PSM distribution across entire complete clades are still in their early stages. Large-scale comparisons are of increasing interest as more is learned about the chemical groups, while larger, more inclusive phylogenies become available.
In this study, we investigate the evolution of PSM classes across seed plants at the family level, using an original data set of PSM composition across 437 plant families arrayed according to a robust phylogenetic hypothesis. More specifically, we test (i) the distribution and evolution pattern that PSMs present in the phylogeny of seed plants; (ii) whether the diversity and composition of PSMs display significant a phylogenetic signal and, if so, their evolutionary patterns; (iii) what factors determine the current PSM diversification pattern and (iv) whether PSM diversity and evolution are associated with other biotic evolutionary factors. Finally, we highlight the opportunities to apply approaches that utilize phylogenetic signals for bioprospecting.

Numbers of PSMs and their distribution in seed plants
Our analyses indicate that usable data were available for 309 of the 437 families of angiosperms and gymnosperms (Supplementary Table 2 Table 2).
Phenylpropanoids were widespread in gymnosperms, but less so in angiosperms (Supplementary Table 2 and Supplementary Fig. 4), being frequently reported in only a few clades including Poales, Piperales, Ranunculales, Myrtales and Rosales ( Fig. 1 and Supplementary Fig. 4).  Figure 1. Distribution of major classes of PSMs across seed-plant phylogeny. Each color represents one PSM class. The key to the different colors is provided by the circles surrounding the tree. A gray color indicates absent or missing data.
Tannins were relatively widely distributed. Among gymnosperms, they were found in Ginkgoaceae, Ephedraceae, Pinaceae and Cupressaceae. In angiosperms, they were frequently found in commelinids, Ranunculales, superrosids and Ericales, and only occasionally in the superasterids ( Fig. 1 and Supplementary Fig. 6).
Terpenoids occur abundantly in several diverse groups of seed plants. In gymnosperms, they were widely distributed in conifers; in monocots, they were frequent in commelinids; and, in eudicots, they were widely present in fabids and superasterids ( Fig. 1 and Supplementary Fig. 7).

Phylogenetic signal in PSMs across seed plants
All eight PSM classes tested exhibit a distribution that is significantly different from one expected under a Brownian-motion model (p Brownian motion < 0.05), while only six of the eight tested PSM classes show a distribution significantly different from a random distribution (p Random < 0.05), with steroids and phenylpropanoids being the outliers (p Random > 0.05). The remaining six PSM classes had a D value of between 0.672 and 0.798 ( Table 1), suggesting that these PSMs show a weak clustering when all seed plants are considered, with some approaching, but still significantly different from, a random distribution. Across seed plants as a whole, and the subset of angiosperms only, a significant phylogenetic signal was observed for all eight PSM classes, although some of the smaller subsets tested showed no significant phylogenetic signal for some of the eight PSMs. Of the 17 sub-clades of seed plants tested, a statistically significant phylogenetic signal was detected in 16 clades for at least one PSM (with the Lamiids being the one exception; for details, see Supplementary Table 3 and Fig. 2a). Across large clades, as the number of families increases, the number of observed cases of significant phylogenetic signal also increases; decreasing the scope from large-scale (e.g. eudicots and superasterids) to ordinal-level (e.g. Malpighiales and Caryophyllales) comparisons showed a reduction in the number of observations with a significant phylogenetic signal (Fig. 2a). A phylogenetic signal was observed in higher taxonomic orders such as the eudicots, core eudicots, superasterids and asterids, with all eight PSMs showing a phylogenetic signal; monocots, superrosids and rosids show a significant phylogenetic signal in seven of the eight PSMs; and, for many groups above the ordinal level, such as malvids, fabids and campanulids, and at the ordinal level (Malpighiales and Caryophyllales), a significant phylogenetic signal was observed in one to four of the eight PSMs ( Fig. 2a and Supplementary Table 3).
The scale investigated changes the interpretations of phylogenetic signals, such as, when looking at seed plants, angiosperms, eudicots, core eudicots and superrosids, all PSMs show D values lower than 1, indicating weak phylogenetic clustering (0 < D < 1). Conversely, when looking at groups such as commelinids, fabids and campanulids, only a few PSMs show a significant phylogenetic signal, with most PSMs having an observed phylogenetic signal reflecting a random distribution (D value close to 1), with a few examples of overdispersion with D values >1 (Supplementary Table 3 and Fig. 2a).
In general, quinones and terpenoids tend to show phylogenetic conservation across different taxonomic scales (e.g. malvids and campanulids), while steroids tend to be randomly distributed across most scales. On lower taxonomic scales, tannins appear to be randomly distributed but, at the level of eudicots and above, they appear to show weak phylogenetic clustering. Only quinones, terpenoids and phenolic acids show a strong phylogenetic signal suggesting extreme phylogenetic conservation (Supplementary Table 3 and Fig. 2a).
After recalculating the D statistic with different subsamples to test the robustness of our estimates based on taxonomic sampling, we found that recalculating the D statistic showed little fluctuation in the recovered D values and no differences in the level of significance due to the amount of missing or unobserved data with 50%, 70%, 80% and 95% of observed cases being used and the remaining set as absent or unknown.

Distribution and diversification of PSM classes in seed plants
Based on the phylogenetic placement of PSMs and applicable fossils (see below), seven of the eight classes were likely present in the common ancestor of seed plants and seven PSM classes (quinones, phenylpropanoids, flavonoids, phenolic acids, steroids, terpenoids and alkaloids) experienced rapid expansion during the radiation of the angiosperms in the Late Cretaceous and subsequent diversification (as shown in Fig 3a). Quinones appear to have preceded the origin of chloroplasts, with ubiquinones and plastoquinone, which function respectively in photosynthesis and mitochondrial electron transport [15,16]. The broad sharing of quinones and other PSM classes among various living organisms appears to have involved symbiotic origins of mitochondria and chloroplasts, and thus subsequent differentiation of the eukaryotic groups that acquired them. For example, quinone diversification appears to have been associated with the rapid evolution that occurred relatively early in the history of seed plants (Figs. 3a and Supplementary  Fig. 5). Only a few quinones are known in ferns and conifers [17], but perhaps this is due more to insufficient investigation in these groups to date than an actual lack of quinones. Quinones are clearly frequent and widely distributed in angiosperms [17], although with many of them (e.g. naphthoquinones) involved in allelopathy among plants [15].
Phenylpropanoids likely originated in early land plants and are found only in their descendants [18,19]. Flavonoids shield photosynthesis and other metabolic activities from UV radiation, thus enabling the evolution and survival to land    [40]) important in pollination (Trichoptera, Coleoptera, Diptera, Lepidoptera and Hymenoptera), (c) bird diversification according to Jetz et al. [49] and (d) mammal diversification according to Stadler [61]. [19], and occur in all extant land plants, including bryophytes, pteridophytes and lycophytes [20].
Simple flavonoids, such as biflavonyls, occur in all of the previously mentioned groups as well as in gymnosperms, while complex and diverse flavonoid structures are known only in angiosperms [20]. As with quinones, most flavonoid diversification appears to have accompanied the explosive radiation of angiosperms. Steroids and phenolic acids are first known from Paleozoic fossils [18] and are ubiquitous in living plants. Understanding the patterns of variation in phenolic acids remains challenging, however, due to their structural complexity and frequent occurrence as members of chemical complexes with proteins, carbohydrates, lipids and other molecules [21]. Although phenolic acids are of special interest with their wide occurrence in foods obtained from plants, including fruits, vegetables, coffee, wine, beer and olive oil, our comparative analysis of their distribution remains incomplete. One hypothesis is that they likely occur in far more than the 167 families in which they have been detected thus far. The limited distribution of phenolic acids in gymnosperms (6 of 12 families) may also be an artifact of insufficient detection analyses (Supplementary Table 2). According to the available data, steroids are distributed sporadically among different groups of seed plants ( Fig. 1 and Supplementary Fig. 8) and more diverse in eudicots than in magnoliids.
Terpenoids are found in essentially all living organisms and are ubiquitous in seed plants ( Fig. 1 and Supplementary Fig. 7); familiar examples with significant functions include chlorophyll, abscisic acid and gibberellins. These compounds are responsible for most of the fragrances that plants produce and thus are common in everyday products. Terpenoids were likely present in the common ancestor of all land plants, with Otto et al. [22] reporting terpenoids from Eocene and Miocene conifer fossils. Terpenoids also occur in Ginkgo biloba, whose origins can be traced back to the Jurassic period [23].
Alkaloids represent yet another class of PSMs that occur in all major groups of organisms [24] but became extraordinarily diverse as the angiosperms radiated ( Fig. 1 and Supplementary Fig. 1). The recorded occurrence of alkaloids in vascular plants other than angiosperms is infrequent [25] but likely insufficient sampling has been conducted to date to draw significant conclusions. Judging from the diversity of alkaloids in both early angiosperms such as magnoliids and in more modern groups such as Ranunculales and superrosids ( Fig. 1 and Supplementary Fig. 1), this group of chemicals has been diversifying with angiosperms from the beginning.

Phylogenetic signals across the seed-plant tree of life
The observed significant phylogenetic signal for PSMs can be explained by phylogenetic conservatism or evolutionary constraints on the development of the compounds' biosynthetic pathways [11], implying that heredity and natural selection could bring about higher levels of phylogenetic signal. Conversely, finding a weaker phylogenetic signal than expected under Brownian motion suggests that diversifying selection may be occurring.
Overall, the phylogenetic signal of PSMs in seed plants and smaller clades show a D statistic of between 0 and 1 (Supplementary Table 3 and Fig. 2a), indicating that PSMs are weakly clustering (D value approaching 0) or randomly distributed (D value close to 1). The current heterogeneous distribution of PSMs likely was governed by historical conditions during the evolution of the plants but the necessity for the genes cannot be ignored. Tohge et al. [26] suggested that the evolution of phenolics has been shaped by prevailing environmental conditions and ecological niches, as well as other key factors including gene duplication and cis-regulatory evolution. There are several hypotheses that help to explain the observations that PSMs have a random distribution: (i) multiple origins and the probability of horizontal gene transfer [27]; for example, diterpenoid alkaloids are currently only found in distantly related clades: Ranunculaceae (Aconitum, Delphinium, Thalictrum), Rosaceae (Spiraea japonica complex), Garryaceae (Garrya), Escalloniaceae (Anopterus) and Polygonaceae (Rumex pictus) [28]; additionally, Pichersky and Lewinsohn [29] insist that members of Rubiaceae (Coffea arabica), Theaceae (Camellia sinensis), Aquifoliaceae (Ilex paraguariensis), Sapindaceae (Paullinia cupana), Malvaceae (Cola acuminata) and Rutaceae (Citrus spp.) evolved caffeine for pollination and seed-dispersal purposes, and many plant families contain cyanogenic glycosides for defense against herbivores due to convergent evolution; (ii) the co-evolution between plant species, plants and their herbivores, and plants and microbes (e.g. bacteria, fungus and virus); for example, Salazar et al. [30] found that generalist herbivores played an important role in shaping chemical diversity in the Burseraceae; and (iii) abiotic environmental conditions (e.g. nutrients, light, water) bring about plasticity in PSMs expression; for example, congeneric species in resource-limiting environments evolved low maximal growth rates and invest more in defense traits, while, in high-resource environments, species evolved high maximal growth rates, invest less in defense and are more tolerant to damage [31]. All the factors mentioned above have played important roles alongside heredity in the formation of observed high PSM diversity.
Across seed plants as a whole, closely related species tend to have similar PSMs in the following categories: alkaloids, phenolic acids, flavonoids, quinones, tannins and terpenoids (Table 1). Most cases in which a significant phylogenetic signal was observed occur at higher taxonomic levels and exhibit a signal of weak clustering (i.e. 0 < D < 1), likely reflecting the high diversity of most PSMs. The high diversity and low signal across most PSMs are unsurprising given that seed plants are a large, globally distributed and diverse clade inhabiting every terrestrial environment available and many aquatic environments. These results indicate that these complex and diverse environments are important external factors for the formation of observed high PSM diversity. Their exceptional biochemical diversity may result from strong diversifying selection during seed-plant diversification and/or relatively weak evolutionary constraints on PSMs [32].
The finding of a significant phylogenetic signal for certain PSM classes in specific clades likely reflects either strong conservation of PSM compounds and/or bursts of diversification. The ability to distinguish between these two possibilities is difficult given the undertaken approach plus different processes and rates can produce similar results in terms of observed phylogenetic signals [33]. For example, terpenoids exhibit weak phylogenetic clustering in rosids (D = 0.292) and asterids (D = 0.641; Fig. 2a and Supplementary Table 3), which can be partially explained by the abundance of diterpenes in rosids [34] and the abundance of iridoids in asterids [35]. Other PSM classes show similar significant patterns, such as quinones in malvids (D = -0.775), Caryophyllales (D = -0.389) and superasterids (D = 0.582; Fig. 2a and Supplementary Table 3), which is driven by the presence of polyketide-derived anthraquinones in malvids and anthraquinones derived from shikimic and isochorismic acids in certain families of superasterids (including Caryophyllales) [15].
Wink [36] indicated that 20%-30% of higher plants have been investigated for phytochemistry, while our results show that ∼70% of families of seed plants have been investigated. The broad-scale nature of our study, and the many gaps in our knowledge of the presence or absence of certain groups of chemicals in plant groups, inhibits our ability to detect significant phylogenetic signals across all scales of inquiry. A very clear and well-known example involves the restriction of betalains to most families of Caryophyllales (i.e. the earlier Centrospermae) [37]. In most species, the anthocyanin pigments that are widespread in all other angiosperms have been replaced by betalains. As analyses of PSM distributions increase, along with structural studies of the molecules, additional robust examples similar to betalains will doubtless be encountered.

PSMs and animal co-diversification
For over 50 years, it has been well understood that PSMs acting as defensive compounds in land plants have played a major role in the diversification of both the plants and animal groups that rely on them [1,3]. Co-evolution or the process of stepwise evolution that is set in motion when animal herbivores gain the ability to feed on plants that are largely protected from most other groups of herbivores by the PSMs they produce and thus gain the ability to radiate and diversify on the plant group [38] has a rich history in the literature, with literally thousands of papers having been written on the subject over the past few decades. With the recently gained ability to analyse phylogenies on both sides of the co-evolutionary race and accurate analysis of the protective compounds, studies demonstrating co-evolution have become increasingly precise and the ubiquity of the process has become evident. What remains is further consideration of the patterns and their history at deep taxonomic levels in both plants and their herbivores, and a further investigation into the evolutionary processes that shaped the pathways of secondary metabolites [39].
The weak phylogenetic signal suggests that the interaction between plants and animals during evolution may also be an important impetus to form the current high PSM diversity; our analyses are consistent with the hypothesis of herbivore/plant co-diversification. For example, although the major insect radiation began about 250 Mya [40,41], two shifts in the diversity of Lepidoptera occurred ∼85 and ∼52 Mya, the major diversification of Coleoptera, Diptera and Hymenoptera began ∼150 Mya [40] (Fig. 3b). The major diversification of alkaloids, terpenoids, tannins and flavonoids occurred concomitantly with these events throughout the Late Cretaceous and Tertiary (Fig. 3a). Condamine et al. [41] postulated that the diversification of angiosperms did not lead to an immediate increase in insect diversification within major groups. In contrast, Zhang et al. [42] suggested that angiosperm diversification in the Cretaceous helped drive the hyperdiversity of herbivorous beetles, with ∼64% of the extant families of beetles originating during this period.
Pollination systems, which are often characteristic of species or clades, can change in a short span of evolutionary time. Volatile terpenes often are important in attracting insects and variable among species within many groups of plants (Fig. 3). We hypothesize that the development of volatile terpenes from allomones to kairomones and synomones has accompanied the evolution of anthophilous insects from their phytophagous ancestors (Fig. 3); extant pollinator groups developed phytophagy before becoming flower visitors. Likewise, pollen carrying appeared subsequent to the development of stigmas and the capacity of flowers to emit volatile terpenoids [43]. Courtois [44] found that at least two major episodes of diversification of volatile terpenes may have occurred in tropical angiosperm trees: one in magnoliids ∼122-125 Ma and the other in Sapindales ∼70 Ma. Coincidentally, the major co-evolution episode involving pollinating insects and angiosperms happened during the same periods of the Cretaceous [45]. To sum up, our analyses suggest that the evolution of the PSM toolkit in angiosperms helped to promote their explosive diversification and that of their pollinating insects ( Fig. 3a and b).
Most flower colors, with a few exceptions (e.g. yellow flowers with carotenoids or reddish/purplish flowers with betalains), result from anthocyaninsa class of flavonoids that form the largest group of water-soluble natural pigments [46]. Anthocyanins play an important role in attracting insects to flowers, including the frequent ultraviolet floral markings that are not visible to human eyes but clearly visible to most insects [46]. Flowers awaiting pollination are often brightly colored, while becoming dark and much less conspicuous once they have been fertilized. The co-evolution of floral anthocyanins and pollinators has been important and ongoing for both groups since the Late Cretaceous [47]. Flowers regularly pollinated by birds are often red, signaling large energetic reserves, while being invisible to most insects [48]. Red, ripe fruits stand out vividly for birds (and for humans) but fade into the background of green leaves for insects. The colors of fruits and flowers became more diverse and began to include more red as the diversification of modern birds took place from the Upper Cretaceous Period onward with a strong increase in diversification rates from ∼50 Mya to the near present [49]. The increased rates of diversification in modern birds coincide with angiosperm diversification, with a great deal of co-evolution taking place between both groups (Fig. 3a-c).
A number of studies have noted that the evolution of seed dispersal by birds and mammals has likely contributed significantly to angiosperm diversification and to co-evolution between plants and animals [50]. Similarly, the diversification of modern mammals-a mainly Tertiary phenomenon-has been accompanied by a great deal of co-evolutionary diversification in plants. In particular for primates, frugivory seems to have predominated in some of the early-diverging primate lineages and affected their characteristics [51] (Fig. 3a and d).

Phylogenetic signals in PSMs for bioprospecting
One traditional method of bioprospecting consists of random taxon selection followed by phytochemical screening or biological assays and/or following up reports of biological activity or ethnomedical (traditional medicine) uses of plants [52]. Although ethnomedicine-based screens are expected to lead to high success rates [52,53], the plants tested are often found not to be pharmaceutically effective [53]. Since related species often share similar biochemical profiles [5,8,14], employing a phylogenetic relationship as a bioprospecting criterion should provide a solid guide to identifying species producing similar chemical compounds.
Based on large-scale patterns, our results indicate that a phylogenetic signal is weak, but there are some branches or clades (e.g. malvids) and certain PSMs showing strong phylogenetic signals (e.g. phenolic acids in fabids). The existence of a strong phylogenetic signal in PSMs may provide indirect evidence of underlying bioactivity and biochemical properties [7]. Such signals can therefore be a useful guide in studying novel natural products [5,7], finding new potentially drug-producing groups of plants [8], and certainly in searching for specific PSMs in related plants [7].

Caveats
We have collected and analysed a comprehensive PSM data set but, as mentioned above, many gaps remain in our knowledge of PSM diversity in seed plants. In addition, the D statistic is sensitive to errors in the phylogeny utilized [54]. Today, however, most relationships among seed-plant families have become clear and are highly supported by increasing amounts and multiple kinds of molecular data. As additional PSM data are reported, it will be valuable to re-evaluate phylogenetic signals among all eight biochemical groups, and especially to explore signals at lower taxonomic levels (e.g. within subfamilies or genera) and within more specific groups of chemicals.

PSM data collection
We aggregated data for four major chemical groups (which were further divided into eight classes) of PSMs that differ in their biosynthetic pathways: alkaloids, phenolics (which were subdivided into phenolic acids, flavonoids and tannins), isoprenoids (which were subdivided into terpenoids and steroids) and quinones. The eighth class, phenylpropanoids, were grouped as lignans and coumarins together, but their phenylpropanoid counterparts, the flavonoids, which are widespread and extraordinarily important in seed plants, were treated separately. For details of PSM classification, see Supplementary Table 1. Some PSMs such as glucosinolates, cyanogenic glycosides, lignin, gibberellins, abscisic acid and sterols were not considered in our study because either they are ubiquitous or because too little is known about their distribution in seed plants to be informative. We aggregated data on PSM presence from journals, books (Supplemental Table 4) and a chemistry database (SciFinder: https://scifinder.cas.org/). The PSM data matrix used in the current analyses is provided in Supplementary Table 2.

PSM mapping
A time-calibrated family-level phylogeny was used for tracking the evolutionary patterns of PSMs across seed plants and calculating the phylogenetic signal of each PSM. We explored phylogenetic patterns of PSMs in seed plants at the family level because (i) seed-plant-family relationships are largely well resolved (e.g. APG IV [55]) and (ii) PSM distribution has not yet been worked out in detail for many of these clades. Qian and Zhang [56] suggested that using a family-level phylogeny might be informative for the phylogenetic analyses of biological and functional traits when a species-level phylogeny is unattainable. The family-level phylogenies of Qian and Zhang [56] (see Supplementary Appendix 3 of their paper) and Zanne et al. [57] were selected because they include broad sampling of all families of extant seed plants, representing the most comprehensive phylogenies available at the onset of this study. A recent larger seed-plant phylogeny was published by Smith and Brown [9] during the course of this project, but the higher taxonomic relationships and divergence time estimates are congruent with the phylogenies used during our analyses. The temporal diversification of PSMs was estimated by mapping PSMs on the time-calibrated phylogeny of Zanne et al. [57], in which the divergence times were estimated for 32 223 species of land plants. We collapsed the Zanne et al. [57] tree to the family level for our analyses. The standard of phylogenetic nomenclature of the angiosperm was according to APG IV [55].
We collected data on PSMs at the species or generic level for all plants but the familial level was the unit of analysis for all reconstructions. PSM characters were coded as binary traits: 1 if the trait was present in at least one taxon within the family and 0 if the trait was absent from, or unclear in, all known taxa within the family. Character-state distributions along the phylogeny were performed using the 'trait.plot' function as implemented in the R package diversitree version 0.9-9 [58].

Testing for phylogenetic signals
There are multiple methods to calculate phylogenetic signals (i.e. Moran's I, Blomberg's K and Pagel's λ) [59] but only the D statistic is designed for discrete binary data and is therefore suitable for our work data [54]. We used the D statistic [54] to test for phylogenetic signals for each of the eight classes of PSMs using the package Caper version 1.0.1 [60] in R. The significance of the difference (p-value) between observed and expected values was investigated by using the variance of phylogenetically independent contrasts relative to 1000 tip-shuffling randomizations. Only when p < 0.05 is the D value statistically significant. A value of D < 0 indicates that a PSM class is phylogenetically conserved (i.e. there is a strong phylogenetic signal); a value of D = 0 indicates that a PSM class is clustered as one would expect if the continuous trait had evolved under a Brownian-motion model and then was converted into a binary trait using a threshold that reproduces the prevalence of the observed trait; a D value of 1 indicates a random distribution of a PSM class across the tips; and a value of D > 1 indicates phylogenetic overdispersion of a PSM class [54]. Because the D statistic is not powerful for trees with <25 tips [54], we calculated D statistics for 18 major clades of seed plants (Supplementary Table 3). To explore the robustness of phylogenetic signals with different levels of taxonomic sampling, we subsampled the observed presence data for PSMs to include 50%, 70%, 80% and 95% of observed cases, setting the rest to missing or unknown.