A Semantic Model for Species Description Applied to the Ensign Wasps (Hymenoptera: Evaniidae) of New Caledonia

Taxonomic descriptions are unparalleled sources of knowledge of life's phenotypic diversity. As natural language prose, these data sets are largely refractory to computation and integration with other sources of phenotypic data. By formalizing taxonomic descriptions using ontology-based semantic representation, we aim to increase the reusability and computability of taxonomists' primary data. Here, we present a revision of the ensign wasp (Hymenoptera: Evaniidae) fauna of New Caledonia using this new model for species description. Descriptive matrices, specimen data, and taxonomic nomenclature are gathered in a unified Web-based application, mx, then exported as both traditional taxonomic treatments and semantic statements using the OWL Web Ontology Language. Character:character-state combinations are then annotated following the entity–quality phenotype model, originally developed to represent mutant model organism phenotype data; concepts of anatomy are drawn from the Hymenoptera Anatomy Ontology and linked to phenotype descriptors from the Phenotypic Quality Ontology. The resulting set of semantic statements is provided in Resource Description Framework format. Applying the model to real data, that is, specimens, taxonomic names, diagnoses, descriptions, and redescriptions, provides us with a foundation to discuss limitations and potential benefits such as automated data integration and reasoner-driven queries. Four species of ensign wasp are now known to occur in New Caledonia: Szepligetella levipetiolata, Szepligetella deercreeki Deans and Mikó sp. nov., Szepligetella irwini Deans and Mikó sp. nov., and the nearly cosmopolitan Evania appendigaster. A fifth species, Szepligetella sericea, including Szepligetella impressa, syn. nov., has not yet been collected in New Caledonia but can be found on islands throughout the Pacific and so is included in the diagnostic key. [Biodiversity informatics; Evaniidae; New Caledonia; new species; ontology; semantic phenotypes; semantic species description; taxonomy.]

Taxonomic descriptions constitute an invaluable source of knowledge about phenotypic diversity across the living world. However, these phenomic annotations are not readily accessed and reused by other biological scientists . Instead, they are "locked away" in the taxonomic literature, written for, and consumed almost exclusively by, taxonomists. While electronic availability of taxonomic treatments is rapidly growing, reflected in changes to publication requirements (International Commission on Zoological Nomenclature 2012), and accelerated by digitization efforts such as the Biodiversity Heritage Library (http://www.biodiversitylibrary.org/ last accessed May 13, 2013), the constituent phenotypic descriptions are composed in natural language (NL), typically making use of specialized anatomical terminology. These phenotypic descriptions are difficult to data-mine (though see Cui (2012); Thessen et al. (2012)); one reason for this is rampant homonymy and synonymy across anatomical concepts (Yoder et al. 2010). We recently proposed that the application of ontological annotation to taxonomic descriptions, as semantic phenotypes (SPs), would allow a broader array of researchers to apply powerful data integration, search, and automated reasoning techniques to these data, increasing the value of taxonomic work and promoting its reuse ).
An ontology is a formal representation of concepts within a domain and the logical relationships between those concepts, supporting knowledge representation with explicit semantics. By referencing standard, shared concepts, diverse data sets can be aggregated and computed over coherently (Washington et al. 2009;Walls et al. 2012). Biological ontologies have become a standard tool for organizing and accessing genomic and phenotypic data from taxonomically isolated model species (Mungall et al. 2010). Applying these tools to the representation and dissemination of comparative descriptive data offers a means to make connections of phenotypic and genomic information across these different, but closely related, sciences (Mabee et al. 2007;Deans et al. 2012).
The Phenoscape project has pioneered the application of ontological annotation to evolutionary phenotypes, by annotating morphological character matrix data sets from the published fish systematics literature (Dahdul et al. 2010a), and demonstrating semantic correspondences to mutant phenotype annotations from the Zebrafish Information Network, ZFIN (Mabee et al. 2012). We believe that taxonomists can build on this approach by incorporating ontological annotation into descriptions at the time of publication, thereby increasing the efficiency and scalability of SP annotation. As taxonomists adopt this approach, tools may be developed that facilitate referencing and integrating with existing semantic data as part of the process of creating new descriptions. Here, we discuss our first steps toward meeting this goal by describing a 640 SYSTEMATIC BIOLOGY VOL. 62 Hymenoptera Anatomy Ontology (HAO) http://purl.obolibrary.org/obo/hao.owl Anatomy of hymenopterans, e.g., "seta", "mesosoma". Phenotype and Trait Ontology (PATO) http://purl.obolibrary.org/obo/pato.owl Phenotypic qualities, e.g., "blue", "foveate", "sigmoid". Biospatial Ontology (BSPO) http://purl.obolibrary.org/obo/bspo.owl Specification of spatial regions within anatomical parts, e.g., "lateral region", "anterior margin". Comparative Data Analysis Ontology (CDAO) http://purl.obolibrary.org/obo/cdao.owl Data matrix elements, e.g., "Standard Character", "TU". Darwin-SW http://purl.org/dsw/ Specimen metadata, e.g., "Specimen", "Identification". Information Artifact Ontology (IAO) http://purl.obolibrary.org/obo/iao.owl Information entities, e.g., "denotes" property. Relations Ontology (RO) http://purl.obolibrary.org/obo/ro.owl Standard property definitions, e.g., "part of", "is bearer of". model for SPs, by using the model with real data, and by discussing directions for further advancing these methods. Our current approach to integrating semantic components with taxonomic treatise is to provide a "traditional" NL description in parallel with a set of semantic annotations. These annotations are made possible by the advent of a relatively new data construct, the multispecies anatomy ontology, which contains generalized semantic definitions for anatomical concepts across a broad taxonomic group (Dahdul et al. 2010b). Annotations (SPs) are logically composed using references to these anatomical concepts, in conjunction with descriptive concepts from other ontologies such as PATO, an ontology of phenotypic descriptors (Mungall et al. 2010), and BSPO, an ontology of biospatial terms, for example, "proximal" and "anterior" (Table 1).
Here, we demonstrate the application of SPs to the taxonomy of ensign wasps (Evaniidae). By working with real data early in the development of methodologies employing SPs, we identify and address some of the limitations of this approach. We conclude with a discussion as to both the real and perceived future problems of SPs as used here and highlight potential explorations that may advance the approach in subsequent work.
Ensign wasps are charismatic hymenopterans that develop as solitary predators of cockroach (Blattodea) eggs inside oothecae (Deans 2005). Despite their ubiquity in the tropics and ease of diagnosis at the family and genus level, very little is known of ensign wasp behavior, host associations, ecology, and other aspects of their biology. This situation is largely the result of an unsystematic species-level taxonomy and the paucity of tools for identification. In this article, we revise the ensign wasp fauna of New Caledonia, a relatively remote island in the South Pacific that is celebrated as a biodiversity hotspot with extraordinarily high levels of endemism [∼80% of plant species found in New Caledonia are endemic (Morat 1993;Jaffré et al. 2001)] and microendemism (intraisland endemism; see Murienne et al. 2005). Given the size of this landmass, its geologic history, and its unique flora and fauna, New Caledonia has served as a laboratory for testing hypotheses about biogeography and speciation. Several of these studies have focused on cockroaches (Murienne et al. 2005(Murienne et al. , 2008, which will undoubtedly be relevant to future, finer-scale studies aimed at understanding the Evaniidae of New Caledonia. Deans (2005) cataloged the world ensign wasp fauna, and Deans et al. (2006) provided the first phylogenetic estimation of generic relationships. These two publications serve as the foundation for an active program in ensign wasp systematics which will hopefully remove roadblocks to future ensign wasp research (e.g., Kawada and Azevedo 2007;Deans and Kawada 2008;Mullins et al. 2012). Our current focus is to provide an updated classification for this fauna and to provide the tools necessary for species-level diagnosis.

Modeling SPs
Our SPs follow the entity-quality (EQ) approach, meaning we draw "entity" terms from an organismspecific anatomy ontology, and phenotypic quality terms from a taxon-agnostic ontology of qualitative descriptors. EQ is a guiding principle for formulating ontological class expressions (SPs), which represent the class of organisms that a given character state denotes. The phenotype class expressions we created follow the approach advocated by Mungall et al. (2007), which we extended to generally adhere to four "template" structures, and to draw on a rigorous classification of morphological characters (Sereno 2007) which is consistent with EQ (Dahdul et al. 2010a). For this study, entity terms were drawn from the Hymenoptera Anatomy Ontology (HAO) and the Biospatial Ontology (BSPO), whereas quality descriptors were drawn from the Phenotype and Trait Ontology (PATO) (

Observations
Diagnostic characters were discovered during direct examination of specimens under an Olympus SZX16 stereomicroscope and by comparing standard view images of specimens. Digital images were made using an Olympus CX41 compound microscope, equipped with a DP71 digital camera. SEM micrographs were taken by Philips XL30 ESEM-FEG (ISU) and FEI Nova 400 NanoSEM (FSU) on Au-Pd-coated specimens. Original images are deposited at Morphbank (http://morphbank.net last accessed May 13, 2013), as image collection 783132. Verbatim specimen label data and museum coden information are provided in online Appendix 2. Taxonomic nomenclature, specimen data, supporting images, and character matrix-based descriptive statements were compiled in the open-source web application mx (http://purl.org/NET/mxdatabase last accessed May 13, 2013) through interactive forms and integrated batch-uploading. NL treatments that include nomenclatural, descriptive, and materialrelated sections were rendered from these data using automated mechanisms included in mx. NL character descriptions were formulated with ontology annotation in mind, and in some cases revised to facilitate annotation.

Generating SPs
Once the data for the complete set of taxonomic treatments were captured within mx, the descriptive matrix elements and specimen identifiers were exported to OWL, using functions newly developed for this purpose within the mx application (Fig. 1). The OWL-formatted descriptive data were represented using terms from the Comparative Data Analysis Ontology (CDAO) (Prosdocimi et al. 2009) and Darwin-SW, an ontology of Darwin Core terms (http://code.google.com/p/darwin-sw/ last accessed May 13, 2013). These data, along with ontologies listed in Table 1, were loaded into Protégé 4.1 (http://protege.stanford.edu/ last accessed May 13, 2013). Navigation of the descriptive matrix elements in Protégé was aided by a custom-built plugin, available from the Github source code repository (https://github.com/balhoff/cdao-protege last accessed May 13, 2013). SP annotations were added to character states within Protégé as OWL class expressions using the built-in Manchester syntax (http://www.w3.org/TR/owl2-manchester-syntax/) last accessed May 13, 2013 editor. These phenotype annotation axioms were created in a separate OWL file which imported the character matrix OWL file exported from mx; this allowed edits to character data within mx to be integrated with in-progress phenotype annotation work in Protégé, by re-exporting and replacing the character matrix file (Fig. 1).

Querying the OWL Data Set
Summary queries over the OWL-annotated data set were performed using custom programs written in Scala, using the OWL API programming library, version 3.2.4 (http://owlapi.sourceforge.net/ last accessed May 13, 2013) and the FaCT++ Description Logic reasoner, version 1.5.3 (http://code.google.com/p/factplusplus/ last accessed May 13, 2013). We performed the same 642 SYSTEMATIC BIOLOGY VOL. 62 queries over a comparison data set covering another ensign wasp genus, generated in parallel to this one .

SP Model
The SP expressions we created build on the OWL representation for EQ explicated by Mungall et al. (2007). In this article, all ontological expressions will be presented using the OWL Manchester syntax, a user-friendly but precise syntax for OWL 2 descriptions (http://www.w3.org/TR/owl2-manchester-syntax/ last accessed May 13, 2013). For example, the "some" keyword in the following expressions signifies an existentially quantified property restriction. For a basic phenotype such as "wing shape: curved," Mungall et al. (2007) demonstrate two cognate forms of OWL phenotype expressions: those described from the perspective of the quality, for example, "curved and inheres_in some wing" (their so-called "normal-form"), and those described from the perspective of the entity, for example, "wing and bearer_of some curved." We adopted the entity-based form, which can be more conveniently associated with a specimen exhibiting the phenotype, through a has_part relationship: Individual: _:specimen1234 Types: has_part some (wing and bearer_of some curved) It is not necessary to explicitly include "shape" in the expression; the knowledge that "curved" is a subtype of "shape" is built into the PATO ontology.
In many cases, the structure that bears the quality being described is an instance of a "general" class, which must be further specified. For example, this may be a class of repeated anatomical structure, such as "bristle," or a class denoting an abstract spatial region; both of these require localization to a specific, containing, anatomical structure, such as "mesosoma." In the terminology of Sereno (2007), for a character such as "ventro-lateral region of mesosoma texture: foveate," the entity bearing the quality (here, ventro-lateral region) is the primary locator, whereas the containing structure is a secondary locator (here, mesosoma). We used a nested series of has_part restrictions, mapping neatly to Sereno's secondary locator(s), L: has_part some (L and has_part some (E and bearer_of some Q)) e.g. has_part some (mesosoma and has_part some ("ventro-lateral region" and bearer_of some foveate)) from online Appendix 1, character 24, "Ventro-lateral region of mesosoma texture: foveate." Context-dependent anatomical entities such as the above are often described as standalone expressions using a "post-composition" approach in other EQ annotation software, such as Phenex (Balhoff et al. 2010). The entity portion of the EQ might be described as an instance of a class such as: "ventro-lateral region" and part_of some mesosoma giving an EQ such as: has_part some (("ventro-lateral region" and part_of some mesosoma) and bearer_of some foveate) However, we found that postcompositions in our annotations were nearly always used to express parthood relationships; these structures could instead be represented using the aforementioned has_part chain, which provides two advantages over the part_of construction: (i) the entity class is more proximately associated with the quality it bears within the Manchester syntax expression, helping the human annotator verify correctness of the expression and (ii) an automated reasoner can infer that the "locator" structure is part of the same organism as the entity structure, a fact that is not implied by the semantics of the part_of -based class expression. Building upon the basic EQ construct just described, we identified four template EQ expressions which could be used to express the meaning of the various character forms in the matrix: 1. Qualitative phenotypes describe a phenotypic quality (Q) borne by a given structure (E).
has_part some (compound_eye and bearer_of some blue) from online Appendix 1, character 9, "Eye color: blue." 2. Presence/absence phenotypes describe the presence or absence of a given structure (E): not (has_part some antennal_shelf) from online Appendix 1, character 8, "Antennal shelf presence: absent." 3. Count phenotypes describe the number of instances (n) of a given structure (E): has_part some (mandible and (has_component exactly 3 tooth)) from online Appendix 1, character 11, "Mandibular teeth count: 3." The property has_component, a nontransitive subproperty of has_part from the OBO Relations Ontology, must be used within count phenotypes due to the OWL DL prohibition of using transitive properties, such as has_part, in cardinality restrictions.
4. Relative measurement phenotypes describe the size of one structure relative to another, allowing a character to be repeatably evaluated across organisms of differing absolute size. These characters may indicate a directional difference in size: has_part some (E1 and bearer_of some (Q and increased_in_magnitude_relative_to some (Q and inheres_in some E2))) e.g.
has_part some (scape and bearer_of some (length and increased_in_magnitude_relative_to some (length and inheres_in some compound_eye))) from online Appendix 1, character 12, "Female scape length: greater than eye height." Or a difference of a specific magnitude (n) expressed as the size of one structure using the size of the other as units: has_part some (E1 and bearer_of some (Q and (has_measurement some ((has_unit some Q and inheres_in some E2) and (has_magnitude value n))))) e.g.
has_part some (seta and bearer_of some (length and (has_measurement some ((has_unit some diameter and inheres_in some ocellus) and (has_magnitude some float[>2.0f]))))) from online Appendix 1, character 10, "Long setae (length >2× ocellus diameter) presence: present." Relative measurement phenotypes highlighted an important limitation of OWL class expressions: without the ability to include variables within the expression, it was impossible to fully represent the intended meaning (Motik et al. 2009). Namely, a critical aspect of the phenotype is that the two structures being compared are components of the very same organism or containing structure. For a phenotype such as "antenna longer than eye," we might create the following class expression: has_part some (antenna and bearer_of some (length and increased_in_magnitude_relative_to some (length and inheres_in some eye))) Unsatisfyingly, to be an instance of this class, an antenna needs to merely be longer than at least one eye in the world, not necessarily an eye possessed by the same organism. Although the semantics of the above phenotype description are not complete with respect to the meaning of the character, they still provide useful information about character data and specimens by making it clear that the given character describes aspects of antenna and eye size. Indeed, for not only relative measurement phenotypes but in fact all phenotype descriptions we created, the OWL class representing a phenotype for a given character state is defined not as equivalent to the EQ description but rather as a subclass-the provided semantics are necessary aspects of the phenotype but not a wholly sufficient description. To make the intended semantics of relative measurement phenotypes more explicit to consumers of the semantic description, we added a rule for each such character state using Semantic Web Rule Language (http://www.w3.org/Submission/SWRL/ last accessed May 13, 2013). For the example phenotype above, the corresponding rule would state that if there is an organism which has as parts an antenna and an eye, and the antenna is increased_in_magnitude_relative_to the eye, then that organism is inferred to be a member of the class defining the phenotype. Within our data set, these rules are not actually exercised; they are included only as a clarification of relative measurement semantics within the limitations of the OWL 2 language.
Of the 43 annotated characters, 24 fell into the qualitative category whereas 13 described a presence/absence. Only one described a count, whereas five described relative measurements. Only four characters (nos. 14, 38, 39, and 43) significantly departed from the basic templates, usually by incorporating more complicated intersection or union expressions. All SP expressions along with the original natural-language characters can be found in online Appendix 1.

Linking SPs to Character Data and Specimens
A phenotype annotation consists of an OWL class, P, the ontological description of that class using an EQ expression, and the linkage of the phenotype class to a particular character state, CS, by means of a class assertion:

CS Type (denotes only P)
The denotes property is defined within the Information Artifact Ontology (IAO) to signify a reference by an informational entity to a "portion of reality." We made use of denotes to connect both the character states and the Operational Taxonomic Units (OTUs) FIGURE 2. An OWL/RDF model showing explicit semantic links between natural-language character matrix data, an ontological phenotype representation, and a museum specimen with taxonomic metadata. In (a), a character matrix cell (_:coding_2222) is represented using the CDAO, upper half, linked to a museum specimen (urn:catalog:NCSU:NCSU:34852) described with the Darwin-SW ontology for Darwin Core, lower half. An EQ representation of the phenotype denoted by the given character state has been composed using terms from the HAO and the PATO. The denotes property, from the IAO, is used to bridge observational data artifacts (CDAO data elements) to direct descriptions of organisms (as EQ phenotypes). By applying an OWL 2 DL reasoner to the character matrix model, we can infer phenotypic characteristics of associated specimens (dashed arrow) using an OWL property chain (b). within a character matrix to the actual organisms (specimens) being described. So, since in our data model an OTU denotes a particular set of specimens under investigation, it follows that a character state denotes any specimen whose OTU has that state as a matrix value. We encoded this assertion by defining an OWL property chain (a property chain describes a path of links in the RDF graph which imply a new direct link between the nodes at either end of the path) (Fig. 2). By connecting phenotype classes to character states using a universally quantified restriction (denotes only P), we can infer that any specimens denoted by OTUs possessing a given character state (and thus denoted also by that character state) are members of that phenotype class. This logical framework allows us to propagate phenotypes to specimens while directly asserting semantic annotations only for character states (Fig. 2).

Ontology Concept Usage
As preliminary examples for how ontology-based annotation facilitates cross-data set computation, we performed queries, programmed using the OWL API, over both this data set and a comparison data set for a relatively distantly related genus, Evaniscus . Because the two data sets make use of shared, community reference ontologies, we were able to directly compare concept (i.e., OWL class) usage across the two studies (Table 2). Although the two taxonomic descriptions of wasp genera were conducted by members of the same research group, the number of referenced ontology classes found in both studies was small relative to the total: 22 out of 100 hymenopteran anatomy concepts, 16 out of 54 quality descriptors, and only 4 out of 24 biospatial concepts. The Szepligetella data set (this study) referenced a markedly broader range of quality descriptors than the Evaniscus data set: 41 concepts across 43 characters versus 30 concepts across 56 characters. We also used the ontologies to assess the distribution of study characters across selected anatomical and qualitative partitions (Table 3). By means of an automated reasoner (FaCT++, driven by our OWL APIbased scripts), we queried for characters describing any structure known to be part of the region of interest, for example, "head." As expected, based on our reading of current and past hymenopteran taxonomy literature, a large proportion of the characters in both studies concern features of the head, thorax, and integument. Although comparable proportions of characters in both studies described types of shape, size, and, to a lesser extent, texture, within the Evaniscus data set characters describing color variation are much more prevalent than in this study (20% vs. 5% of all characters).

Remarks
The two putative syntype specimens deposited at LSUK are not conspecific. The male specimen (LINN 2719) is morphologically consistent with concept of E. appendigaster used in the majority of literature, whereas the female specimen (LINN 2720) is easily diagnosed, based on surface sculpture, mesosoma shape, appendage color, and wing venation, as Prosevania fuscipes (Illiger 1807). We wait to designate a lectotype, however, until these specimens can be observed directly.

Etymology
The species name refers to the collector of the type specimens, M. E. Irwin.

COMMENTS ON THE ENSIGN WASPS OF NEW CALEDONIA
There are likely >100 species of cockroaches on New Caledonia that construct egg cases (P. Grandcolas, in litt.) which could serve as "hosts" for ensign wasp larvae. Yet, we could only find two species of evaniid that are presumably native to the island and one introduced species, whose hosts are well known. The two native ensign wasp species could be (i) generalists that predate on numerous cockroach species (as has been discussed for other evaniids; see Deyrup and Atkinson 1993), (ii) cryptic species complexes that could not be distinguished during our study, and/or (iii) part of a larger fauna of microendemic species, most of which have yet to be collected and described. This revision hopefully catalyzes future efforts to elucidate the natural history of the New Caledonian ensign wasp fauna.

Phenotype Annotation
Although semantic annotation of evolutionary character matrices has been initiated post hoc for some published studies (Dahdul et al. 2010a), this article along with Mullins et al. (2012) are the first publications to provide both new taxonomic descriptions and corresponding semantic data. We aim to take the first steps along the path outlined by Deans et al. (2012), toward the creation of computable and reusable phenotypic data as a product of taxonomic studies. The ontological annotation of 43 characters here, and 56 in Mullins et al. (2012), resulted in a rich, queryable OWL data set, reveals several challenges inherent in representing natural phenotypes in OWL, and suggests means to better facilitate the application of semantic technologies within systematics.

SP Expressivity
A critical consideration for the application of semantic technology to phenotypic descriptions is how well a logical representation can express the meaning currently conveyed by NL. For example, creating logical statements that encapsulate the full meaning of the single NL statement "aedeagus apical column broad, and flattened apically, ending abruptly with rounded basal angle and narrow, hooked apical lobe, column extending more than half its length beyond penis valve, bending dorsally," is a much larger challenge relative to the statement "gonocoxa orange" (both statements from Baptiste and Kimsey 2000). This will depend on the logical expressivity of the knowledge representation system being used (the types of statements it is possible to make), and also how straightforward it is for scientists to apply features of the language to their descriptions. We focus our approach on the OWL family of knowledge representation languages for several reasons. First, as the ontology language of the semantic web, OWL identifiers are URIs: global identifiers suitable for publishing and referencing data directly on the web. Second, OWL is a free standard with a formal description logic foundation; this has allowed the development of multiple freely available, compatible, automated reasoning systems, each with its own strengths and weaknesses. There is an active community of users developing tools for working with OWL. Finally, OWL is being used for many life sciences semantic applications, and the W3C (World Wide Web Consortium) has actively considered use cases from the life sciences community in the development of semantic web technologies, for example, through the Semantic Web Health Care and Life Sciences Interest Group (http://www.w3.org/blog/hcls/ last accessed May 13, 2013). Mungall et al. (2007) provided an OWL model for phenotypes based on the EQ approach. They found that composing phenotypes as OWL class expressions works well, with minor exceptions, for a variety of phenotypes, promoting modular reuse of ontologies and rigorous reasoning. An application of the basic EQ model to more than 4600 characters from the phylogenetic systematic literature found phylogenetic characters to be very amenable to EQ description (Dahdul et al. 2010a). However, one challenge noted in that study was the difficulty of annotating the finer aspects of the quality portion of the EQ for many characters. This was addressed by selecting a predefined set of "upper" quality terms from the PATO ontology, which curators used to provide a "coarse granularity" of phenotype annotation. This coarse annotation accelerated curation but still provided useful classification of phylogenetic characters.
Building on these forays into EQ phenotype annotation, we found that the expressivity of OWL DL was adequate for most of our characters. The OWL expressions lack some of the subtlety possible in natural-language descriptions, but because we defined our characters with semantic annotation in mind (i.e., we forced ourselves to describe character states explicitly from the outset), EQ translation was generally straightforward. However, as described in the "Results" section, one class of characters, "relative measurements," proved challenging to adequately describe using OWL. Five out of the 43 characters described here involved relative measurements. Although they constitute only a little more than 10% of the characters in this study, relative measurements are a common and important means for describing morphological changes in a size-independent manner. The OWL limitation stems from a requirement that class descriptions exhibit a tree-like structure, a factor in decidability (Motik et al. 2009). Although a semantic model of "instance-level" structure measurements of a particular organism can be adequately constructed using class and property assertions on a graph of OWL instance nodes, the general definition for a class of all organisms with a given relative measurement cannot be fully described in OWL 2 (Magka et al. 2012). Although our relative measurement annotations do not capture the full meaning intended by each character, we feel that even with these limitations they provide useful semantic context for the characters. The classes of structures involved (e.g., "metapectus" and "mesopectus"), and the quality being compared (e.g., "length"), are captured within the semantic annotation, facilitating useful ontology-driven queries over the characters. Development of better reasoning approaches for semantic data is an active area of research, and the tree-model restriction of OWL is being specifically addressed, as it is a limitation for other scientific descriptions such as that of chemical structures (Hastings et al. 2010). Future versions of OWL (or a related language) involving "description graphs" may allow much more complete representation of, and more powerful reasoning with, these characters (Magka et al. 2012).
With representational limitations in mind, we suggest that semantic annotation should be primarily considered as a means to classify NL phenotypic descriptions, rather than fully replace them. In contrast, Vogt et al. (2010) advocate direct representation of morphological data using RDF statements. It may be that such an approach, with the right tools, becomes feasible for "instancelevel" description (above), whereas complete definition of character-state classes, as explored here, is less so. By reference to shared ontologies, both approaches become interoperable. Semantic annotations can describe the implications of a character state for denoted organisms as far as is feasible, allowing inference of implied facts and facilitating search through logical subsumption and semantic similarity approaches. Although semantic annotations as provided here may not represent every detail expressed in each character state, they should also not result in any unintended inferences.
Indeed, the ontological classification of characters in this study and in Mullins et al. (2012) allowed straightforward comparison of anatomical coverage of the phenotypic descriptions in each (Tables 2 and 3). Although these analyses are admittedly simplistic, as the number of available semantic data sets increases, more ambitious use cases will become feasible . Semantic similarity approaches (e.g., Washington et al. 2009) across Metazoa will be facilitated by further development of comprehensive multispecies anatomy ontologies such as Uberon , which currently provides a coherent ontological structure, to varying degrees of granularity, for the anatomy of many animal groups including vertebrates, insects, and even echinoderms.

Annotation Workflow
It is clear that widespread adoption of semantic methodologies within systematics will require development of tools that facilitate, rather than complicate, systematists' work. The initial approach demonstrated here requires some familiarity with both OWL and the Protégé application. Our cdaoprotege plugin made navigation of characters and associated states within Protégé fairly straightforward; however, the mechanics of manually creating ontology classes for phenotypes and consistently creating all the required links to character states did prove challenging to nonexpert Protégé users. Even so, it is our experience that interested biologists can quickly comprehend and apply OWL class descriptions, particularly using the English-like Manchester Syntax. We have prototyped a fully integrated SP annotation interface directly within mx, building on the approach begun with the Phenex annotation application (Balhoff et al. 2010). However, we found that more experience with the composition of real annotation results, created in the unconstrained environment of Protégé, would be required to better assess the requirements of such a system for taxonomy. These data sets will provide the basis for further work in that area.
Direct creation of phenotypic class expressions within Protégé provides the annotator a maximum of freedom to express the meaning of the phenotype as closely as possible; with this freedom comes the possibility that the annotator may create unintended logical inferences, whether through mistakes or lack of ontology expertise. An alternative approach we plan to explore further is a simpler ontology "tagging," wherein high-throughput text-processing systems, for example, Textpresso (Müller et al. 2004) or CharaParser (Cui 2010(Cui , 2012, can be used to more quickly identify terms within phenotype descriptions that can be matched to ontology classes. This approach would support basic query answering over character descriptions using anatomy and quality ontologies, but, by removing the "internal" semantics of the phenotype, would eliminate some use cases requiring more sophisticated inferences about organisms, such as analysis of presence/absence of structures across evolution. For Hymenoptera specifically, we have provided a solution that lies between the complexity of producing SPs and traditional treatments alone. Hymenopterists can use the "analyze" functionality at the HAO portal (http://portal.hymao.org last accessed May 13, 2013) to compare their descriptive text against the HAO and to return ontology concept URIs (Seltmann et al. 2012).
Either approach, careful SP description or simple semantic tagging, requires a well-developed multispecies anatomy ontology for maximum utility. The importance of expert morphologists has never been greater in this regard (Dahdul et al. 2010b;Yoder et al. 2010

Dissemination
As the ontology language for the Semantic Web, OWL not only provides an ontological reasoning framework but also a means to publish our descriptions as RDF, contributing to the emerging universe of Linked Open Data (Bizer et al. 2009). The use of shared, communitydriven ontologies, containing standard concepts with global identifiers, promotes integration across data sets (Washington et al. 2009;Mungall et al. 2010). Beyond ontology concepts, RDF allows us to provide every data element with its own global URI, including characters, states, and OTUs. Publishing descriptive data in this way on the Semantic Web should facilitate explicit reuse of characters across studies: anyone can code another taxon for published characters in a way that can be seamlessly integrated with existing data. As charactercentric semantic datastores such as the Phenoscape Knowledgebase (Mabee et al. 2012) expand in scope, richly annotated taxonomic descriptions will be ripe for inclusion.

SUPPLEMENTARY MATERIAL
Data files and/or other supplementary information related to this paper have been deposited at Dryad under http://dx.doi.org/10.5061/dryad.2gd84.

FUNDING
This material is based on work supported by the National Science Foundation (grant numbers DBI-0850223, DEB-0842289, DEB-0956049, and EF-0905606); and the National Evolutionary Synthesis Center and benefited from discussions initiated through the Phenotype Research Coordination Network (in part). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.