In a recent paper in this journal, Bertrand and Härlin (2006; henceforth B&H) argued that there is a trade-off between stability and “universality” in the application of taxon names in the phylogenetic system of nomenclature (PSN) that was originally developed by de Queiroz and Gauthier (1990, 1992) and is incorporated in the PhyloCode (Cantino and de Queiroz, 2007). By “stability,” they meant stability of taxon content (i.e., clade composition), and by “universality,” they meant applicability of a name in the context of “a wide range of phylogenetic analyses and trees” (p. 848. The term “universality” more commonly refers to the meaning of a name rather than the context in which it can be applied; e.g., de Queiroz and Gauthier, 1992, and references cited therein.) Specifically, B&H maintained that increasing the number of specifiers in a phylogenetic definition beyond the minimal two, which is often done to maximize compositional stability, reduces the universality of the names, and they therefore advocated using only two specifiers. We argue here that the problem created by using more than two specifiers was overstated by B&H and, secondarily, that universality (hereafter used in the sense of B&H) is not always desirable.
The primary role for multiple specifiers is in the naming of a clade for which monophyly is well supported but within which the basal relationships are uncertain (e.g., for a node-based name) or for which the sister group is uncertain (e.g., for a branch-based name). In either case, using multiple specifiers permits construction of an explicit phylogenetic definition that will apply to the same clade composition under a variety of plausible phylogenies, whereas using only two specifiers increases the risk that further refinement in phylogenetic understanding will result in the name applying to a different clade composition than was intended. We consider it undesirable to sacrifice compositional stability (by reducing the number of specifiers) in order that users can apply the name in the context of a higher proportion of published phylogenies without considering external information. We believe that compositional stability, gained by using more than two specifiers when needed, is a more important goal of biological nomenclature than universality (in the sense of B&H). A second role for multiple specifiers—to restrict a clade name to a particular phylogenetic hypothesis—is also discussed briefly below.
Specifiers are the species, specimens, and apomorphies that are cited in a phylogenetic definition as reference points to specify the clade to which the name applies (Cantino and de Queiroz, 2006). Every phylogenetic definition includes at least two specifiers, but it is common practice to use more than two. The reasoning of B&H was essentially as follows: (1) When applying a previously defined name to a new phylogenetic tree, every specifier in the definition must be present on the new tree in order to apply the name in the context of that tree (i.e., to identify the clade, if any, to which the name applies). (2) The probability that all of the specifiers in a phylogenetic definition will be found in any given tree decreases as the number of specifiers increases. (3) Therefore, increasing the number of specifiers beyond the minimum of two reduces the probability that the definition can be applied in the context of a wide range of trees—i.e., the universality of the name. They went on to argue that definitions should be restricted to two specifiers because maximizing universality of taxon names is more important than stabilizing taxon content. We disagree with their first premise and therefore their conclusions.
To demonstrate their point, B&H took a number of names that have been phylogenetically defined in two groups (Lamiales [an angiosperm clade] and Foraminifera) and then inspected a series of published trees to see if they could identify the clades to which the names applied by finding the specifiers on those trees. In most cases, one or more of the specifier species were not present on the tree, and B&H concluded that the name could not be applied on that tree. Similarly, they concluded that any definition that used an apomorphy specifier could not be applied on a tree on which that apomorphy has not been optimized.
B&H's central premise—that every specifier must be present on a tree in order to apply a phylogenetically defined name in the context of that tree—assumes that the user knows nothing about the systematics of the group other than the information in the tree. This is an inappropriate assumption under either the PSN or the traditional rank-based system of nomenclature (RSN). Under both systems, one would expect that systematists knowledgeable about a particular group would be able to identify a taxon on a tree, regardless of whether the specifiers (or the type, in the case of the RSN) are represented. Under either system, it is up to the author of a particular study, or the reader interpreting the study, to know the group well enough to be able to determine the positions of taxa on a tree. Absence of a specifier on a given tree does not prevent identification of the taxon to which the name applies, just as absence of a type on a given tree does not prevent identification of the taxon to which the name applies under the RSN. Phylogenetic systematists using the RSN routinely apply names to taxa whose type species are absent from a particular tree. They do so because they are aware of evidence from other studies that the type is related to some other taxon that is represented on that tree. The situation is no different with phylogenetically defined names. Granted, there may be some users of the literature who are not familiar enough with the group concerned to figure out where a particular name applies on a tree, but this does not decrease the explicitness of the phylogenetic definition. By analogy, a set of instructions for assembly of a piece of equipment is not considered inexplicit just because some readers are unfamiliar with the names of some of the components.
Analysis of a couple of B&H's examples will illustrate our point. In their table 3, they stated that our definition (Cantino and Olmstead, 2004) of Lamioideae (the most inclusive crown clade containing Lamium purpureum but not Scutellaria galericulata) cannot be applied in the context of the Cantino et al. (1997) consensus tree because this tree included Scutellaria bolanderi instead of S. galericulata. Those familiar with Lamiaceae will know that the monophyly of Scutellaria is strongly corroborated by several distinctive synapomorphies. It is therefore justifiable, when applying this definition to that tree, to assume that S. bolanderi occupies the same position that S. galericulata would occupy if it instead had been included in that analysis. Contrary to B&H, this reasoning does not use “genus information as surrogate for phylogenetic information” (p. 851); it is based on evidence for monophyly, which would be equally applicable if Scutellaria were recognized at some other rank (or unranked). Of course, one has to be somewhat knowledgeable about the group to know which species on a tree can be treated as placeholders for specifiers, but this expectation (which is true under both systems of nomenclature) is not unreasonable.
The situation is similar when one of the specifiers is an apomorphy. B&H claimed that our definition of Nepetoideae (the most inclusive crown clade exhibiting hexacolpate pollen synapomorphic with that in Nepeta cataria) cannot be applied in the context of a tree published by Kaufmann and Wink (1994) because the distribution of hexacolpate pollen is not shown on that tree. However, the distribution of hexacolpate pollen in Lamiaceae was tabulated by Cantino and Sanders (1986). When these data are taken into consideration, it is easy to determine which clade on Kaufmann and Wink's tree is Nepetoideae. We agree, though, that using morphological apomorphies as specifiers will sometimes make it more difficult to apply names on molecular trees, particularly if there are many species that have not been examined with regard to the character of concern.
It is often the case that several phylogenetic analyses of a particular group have been conducted with different data sets. The taxonomic sample may be broader in some studies and denser in others. In most large groups, hierarchically nested sets of phylogenetic studies have been published. A particular species may have been used in some analyses but not others, but it is frequently possible to determine where a missing species fits on a tree from the nesting relationships of its relatives. When constructing definitions for names within a group, a careful systematist would want to consider all of the available studies of that group, not just the positions of the species that happen to have been used in every analysis. Consequently, phylogenetic definitions are likely to include species that are not present on every tree that was considered when constructing the definition. B&H would apparently argue that such definitions would not be applicable on some of these trees. In contrast, we maintain that such definitions can be applied in the context of any of these trees when the others are considered as well, just as the author of the definition considered all of these trees when constructing it. The sort of reasoning described above forms the basis for “supertree” approaches for finding phylogenetic consensus among studies with partially overlapping taxon sampling (Sanderson et al., 1998; Bininda-Emonds, 2004) and for “supermatrix” approaches for obtaining trees from analysis of large sets of partially overlapping data (e.g., Driskell et al., 2004), and it recognizes that the many disparate individual studies published in the phylogenetic literature are individually unable to provide all of the needed information on which to base taxonomic and nomenclatural decisions.
Bertrand and Härlin noted correctly (p. 856) that explicitness is one of the advantages of the PSN, but they went on to imply that use of more than two specifiers jeopardizes the explicitness of PSN. On the contrary, a phylogenetic definition with ten specifiers is every bit as explicit as one with two, and its application may be unambiguous if the user considers a variety of published information to determine the positions of specifiers that don't happen to be present on a particular tree. In the same vein, a reviewer characterized our viewpoint as follows: “Scientists working with a group of organisms have an intuitive feeling for what should, and should not, be included in the group. Therefore, it is not that important for all the specifiers to be present because a knowledgeable person could infer the inclusiveness of a taxon anyway.” This is a misrepresentation of our argument. Our central point is that users of phylogenetic definitions can be expected to make use of all of the relevant information at their disposal, not just the information in a particular tree. There is nothing intuitive about combining data from different trees.
Most of the trees on which B&H try to apply names were published before the definitions were constructed. In this regard, they stated (p. 851): “It may be argued that we do not give a fair picture of the PhyloCode's full potential because most of the phylogenies under scrutiny were constructed before the phylonames. However, we nevertheless believe that we provide a realistic estimation of difficulties that the PSN will encounter in practice once implemented.” On the contrary, we maintain that their retroactive application of definitions to old trees is bound to lead to more examples of missing specifiers than subsequent application of definitions to new trees. Consider their Verbenaceae example (p. 851). Our definition of Verbenaceae was based on an unpublished cladogram that included 37 members of this clade and 11 outgroups. In contrast, the Wagstaff and Olmstead (1997) tree, to which B&H compared our definition and concluded that it is inapplicable, included only six members of this clade. Future studies focusing on Verbenaceae are unlikely to ignore the new data after their publication and include just a few exemplars like the 1997 study. Of course, future studies focusing on larger clades may include only a few exemplars of Verbenaceae, but the authors of such a study would be wise to choose their exemplars from among the specifiers of this name (if being able to apply the name in the context of this tree is important). And if they don't, then the argument we presented above would still apply—that is, one could figure out how to apply the name Verbenaceae in the context of a tree that includes only a few of the specifiers by combining the tree with other published information.
Bertrand and Härlin presented this issue as a concern about the PSN. It is important to distinguish the PSN and the PhyloCode. The PSN is a general approach that assigns names to clades by means of phylogenetic definitions. The PhyloCode is a particular set of rules and recommendations for using the PSN. We believe that B&H overstated the drawbacks of using more than two specifiers, but to the extent that the drawbacks exist, they apply only to a few recommendations in the current draft of the PhyloCode, not to the PSN more generally. Furthermore, B&H's characterization of the PhyloCode in this regard is not entirely accurate. On the subject of “choice of specifiers,” B&H stated (p. 854), “the PhyloCode recommends selecting many basal taxa as specifiers in node-based definitions and many external taxa to the clade being named in stem-based definitions.” On the contrary, the PhyloCode only recommends using multiple specifiers as needed to minimize uncertainty in resolution around a node. This is quite different from recommending “selecting many specifiers” in an absolute sense. What the PhyloCode actually says (Recommendations 11D and 11E), as B&H pointed out elsewhere (p. 849), is “In a node-based definition, it is best to use a set of internal specifiers that includes representatives of all subclades that credible evidence suggests may be basal within the clade being named…” and “[I]n a branch-based definition, it is best to use a set of external specifiers that includes representatives of all clades that credible evidence suggests may be the sister group of the clade being named.”
Bertrand and Härlin are concerned that using multiple specifiers may result in constraints on future phylogenetic studies, because researchers will be obligated to include those specifiers. This follows from their central premise that all specifiers must be present on a tree to apply a name. Since we reject this premise, we also disagree that there is any such obligation. Well-chosen specifiers for a node-based name, regardless of the number, will represent the clades arising most closely from the root node of the named clade. Any good phylogenetic analysis also should strive to include representatives of those descendant clades. Thus, whether the specifiers are chosen or not, one would expect that the sampling in a good phylogenetic study should parallel the selection of specifiers in the first place. However, phylogenetic research is conducted for a variety of purposes, of which classification is only one. If for no other reason than this, one should not expect all published phylogenies to include specifiers, types, or any other species of interest to taxonomists who are in the business of identifying and naming taxa.
B&H consider it very important that taxon names have high universality—i.e., that they “be applicable in a wide range of biological contexts” (p. 848). We agree in general, but there are certain situations in which low universality is desirable. If a name is coined in the context of a particular phylogenetic hypothesis and associated with that hypothesis in people's minds (e.g., “Paleoherbs”; see Bryant and Cantino, 2002), clear communication may be better served by defining the name so that it is only applicable under that hypothesis. Restriction of a name to a particular hypothesis can be accomplished with qualifying clauses (PhyloCode Art. 11.8, Example 1) or through the use of more than the minimum number of specifiers (Art. 11.8, Example 3).
One situation in which B&H's central premise may be correct is in the use of phylogenetic definitions in phyloinformatics (e.g., Hibbett et al., 2005). A computer that is being asked to infer the meaning of a name by applying a definition to a tree will presumably not have access to the full breadth of external knowledge that a systematist has (though a nested hierarchy of phylogenetic information already is used in phyloinformatics; e.g., in TreeBASE [http://www.treebase.org/treebase/index.html]). The computer is therefore comparable to the naive user that B&H seem to have in mind in framing their argument. Those who construct phylogenetic definitions should consider the value of definitional simplicity (for the benefit of phyloinformatics) as well as the value of compositional stability (for the benefit of the many other users of taxon names). In some cases, the balance may favor fewer specifiers, and therefore less compositional stability, than if phyloinformatics were not a consideration, but these decisions should be made on a case-by-case basis by knowledgeable systematists. The recommendation (B&H) that phylogenetic definitions be restricted to two specifiers is unnecessary, and its adoption would greatly reduce the value of the PSN to the many systematists who consider compositional stability to be important.
We thank J. Anderson, M. J. Donoghue, T. Eriksson, M. Härlin, M. Lee, and M. Thollesson for their helpful comments on the manuscript.