Abstract

A goal of the biodiversity research community is to digitize the majority of the one billion specimens in US collections by 2020. Meeting this ambitious goal requires increased collaboration and technological innovation and broader engagement beyond the walls of universities and museums. Engaging the public in digitization promises to both serve the digitizing institutions and further the public understanding of biodiversity science. We discuss three broad areas accessible to public participants that will accelerate research progress: label and ledger transcription, georeferencing from locality descriptions, and specimen annotation from images. We illustrate each activity, compare useful tools, present best practices and standards, and identify gaps in our knowledge and areas for improvement. The field of public participation in digitization of biodiversity research specimens is in a growth phase with many emerging opportunities for scientists, educators, and the public, as well as broader communication with complementary projects in other areas (e.g., the digital humanities).

Worldwide, there are approximately three billion curated biodiversity research specimens, hereafter referred to simply as specimens, including a wide variety of samples from extant and extinct organisms, in the collections of museums, universities, government agencies, and research centers (Beach et al. 2010). These specimens and their data represent irreplaceable legacy information about our biosphere in an era dominated by planetary-scale anthropogenic change (Walther et al. 2002, Parmesan and Yohe 2003) and unprecedented biodiversity loss (Jenkins 2003, Loreau et al. 2006, Wake and Vredenburg 2008). Already, this rich record has been used to benchmark the biological impacts of environmental change and to elucidate causal factors (Moritz et al. 2008, Rainbow 2009, Erb et al. 2011, Everill et al. 2014). It also provides a unique resource for educators to teach core bioscience topics, as has been recently highlighted by the activities of the Advancing Integration of Museums into Undergraduate Programs (AIM-UP!) Research Coordination Network (Cook et al. 2014). In order for biocollections to be used to their full potential by researchers, policymakers, educators and the public, there must be widespread access to the data they contain (Ehrlich and Pringle 2008, Parr et al. 2012). However, only about 10% of the roughly one billion specimens in the United States have been digitized with information available online (Beach et al. 2010). Continued, broadscale digitization of specimens (including databasing, georeferencing, and digital imaging) and supporting source materials (e.g., field collection notebooks) is crucial for overcoming this impediment, but the scope and effort required to bring about a digitized biocollections commons is immense. Earlier projects that have engaged the public in online tasks in other areas of science suggest that such approaches might be used to accelerate specimen digitization. For example, to process the hundreds of thousands of images produced by the Sloan Digital Sky Survey, scientists developed Galaxy Zoo (http://galaxyzoo.org) and received more than 40 million classifications from 100,000 public volunteers (Lintott et al. 2008). Here, we provide an overview of how online engagement of the public can advance digitization in three activities—transcribing of specimen label and ledger text, georeferencing collection localities, and annotating specimens—and how this engagement can lead to a deeper public understanding of biodiversity science.

Public participation in the generation and communication of knowledge in the sciences (Bonney et al. 2009), humanities (Dunn and Hedges 2013), and other areas (such as may be seen in Wikipedia, www.wikipedia.org, and OpenStreetMap, http://openstreetmap.org) has become increasingly important (Bonney et al. 2014, Kelty et al. 2014). Public participation is also known as citizen science (when scientists collaborate with the public) or crowdsourced science (in which contributions are made by a large, usually online and occasionally paid community of individuals; Wiggins and Crowston 2011). In the sciences, the need for a formalization of practice related to public participation and the establishment of supporting infrastructure has been met by several recent organizational developments. The Human Computation and Crowdsourcing meetings began as annual workshops sponsored by the Association for the Advancement of Artificial Intelligence in 2009 and became an annual conference in 2013. The biennial Citizen Cyberscience Summit in the United Kingdom began in 2010 and is focused on Internet-deployed public engagement projects. In August of 2012, Frontiers in Ecology and the Environment released a special issue of their journal entitled Citizen Science—New Pathways to Public Involvement in Research. That same month, a two-day Public Participation in Scientific Research (PPSR) workshop was convened at the Ecological Society of America annual conference to gather together science researchers, project leaders, educators, technology specialists, evaluators, and others representing diverse disciplines (including astronomy, molecular biology, human and environmental health, and ecology) to specifically discuss the formalization of the field of PPSR (Benz et al. 2013). Activities at that meeting established the foundation for the newly formed Citizen Science Association and a nascent journal. Other workshops have focused on more narrow implementations of PPSR, including iDigBio's Public Participation in Digitization of Biodiversity Specimens Workshop in September 2012 (http://idigbio.org/content/public-participation-digitization-biodiversity-specimens-workshop-report) and the CITSCribe Hackathon (cosponsored by iDigBio and Notes from Nature; http://idigbio.org/content/citscribe-hackathon) in December 2013 to improve online specimen label and ledger text transcription.

The most frequent steps in the initial digitization of specimens have been described as five discrete task clusters (Nelson et al. 2012): predigitization curation and staging, specimen image capture, specimen image processing, electronic data capture, and georeferencing locality descriptions. The first three of these tasks are largely limited to onsite participation, because that is the location of specimens (required by the first two) or the large image files (which are involved in the third). The last two task clusters can be performed onsite or online offsite. Predigitization curation is generally the first step in a digitization workflow and encompasses preparing specimens for data entry, imaging, or both. Some of these tasks include applying barcodes or other identifiers to collection objects, updating taxonomic determinations and nomenclature, transporting specimens to digitization stations, cleaning specimens for imaging, and routing damaged specimens to a conservation workflow. Although the first three task clusters can benefit substantially from onsite public participation, we focus here on those activities that can be deployed online, where the number of potential participants is greater, because it is less limited by monetary and physical constraints such as those related to onsite supervising personnel, workspace, and parking. Improvements and advancements made to online digitization tools for public participation might also lead to their widespread use onsite by paid staff.

iDigBio's Public Participation in Digitization of Biodiversity Specimens Workshop participants recognized 26 digitization activities in which the public could participate, some of which fit neatly into the last two task (i.e., activity) clusters of Nelson and colleagues (2012) described above and others that occur after the initial digitization of the specimen data and subsequent deployment of it online. Given parallel advances in framing public participation in the digital humanities (Dunn and Hedges 2013), we have organized these tasks within the Dunn and Hedges (2013) typology (table 1). Dunn and Hedges’ (2013) framework emerged from a literature review, two workshops on the topic, an online survey of contributors to crowd-sourcing projects, and interviews with contributors and consumers of the data. They propose a framework for thinking about these projects in which “a process is composed of tasks through which an output is produced by operating on an asset” (p. 156). Dunn and Hedges (2013) identified twelve processes. The iDigBio workshop occurred prior to publication of the Dunn and Hedges’ (2013) typology, but independently identified activities that fall under 11 of the 12 processes (table 1). The process for which corresponding digitization activities were not identified—commenting, crucial responses, and stating preferences—could become important in future education and outreach activities. The independent and nearly simultaneous determination of these similar activities by iDigBio workshop participants and Dunn and Hedges (2013) demonstrates the timeliness of the topic and the occasion for increased connectivity between the fields of biodiversity sciences and humanities.

Table 1.

Digitization activities identified by the participants of iDigBio's Public Participation in Digitization of Biodiversity Specimens Workshop organized by the twelve crowdsourcing processes recognized by Dunn and Hedges (2013) for the humanities.

Process Activity 
Transcribing • Into appropriate database fields. 
Cataloging • Overlaps broadly with other processes (e.g., transcribing and georeferencing); identified by the production of structured, descriptive metadata. 
Translating • Between a nonnative language and the native language (e.g., between Chinese and English in the United States). 
Georeferencing • Assign latitude and longitude and measures of precision to collection localities not previously described in that way. 
Recording and creating content • Provide location and other information on historical place names used in collection locality descriptions. 
Mapping • Production of maps useful for identifying outliers that might be due to errors or something that is biologically interesting. 
 • Production of maps useful for citizen science research. 
Tagging • Taxonomic identity. 
 • Phenological state or life stage. 
 • Existing disease, herbivory, parasite, etc., at collecting event. 
 • Damage (e.g., from insects) following collecting event. 
 • Entity–quality statements (e.g., the flower is red). 
 • Landmarks for morphometric analysis. 
 • Scientific significance of specimen (e.g., unrecognized type specimen). 
 • Other significance of specimen (e.g., to history). 
 • Digitization process errors (e.g., image file named incorrectly). 
Categorizing • Any of the collaborative tagging activities where the descriptive categories that may be used are constrained. 
Linking • Determine if similar specimens are duplicate collections (exsiccatae; often at different institutions, sometimes with different annotation histories). 
 • Determine if similar records in different data sources are from the same specimen (e.g., from a biodiversity research collections data management system and GenBank). 
Contextualization • Associate specimens with the legacy scientific literature that cites them. 
 • Associate specimens with field collection notebook pages that cite them. 
 • Associate specimens with ongoing scientific research that uses them (e.g., in a citizen science content management system). 
Commenting, crucial responses, and stating preferences • None identified. 
Correcting or modifying content • Any content generated by a collection's staff, public participants, or automation (e.g., optical character recognition or automated georeferencing). 
 • Identification of outliers in appearance (e.g., from images of specimens identified as same taxon). 
 • Identification of outliers in georeferenced data (e.g., from map of all localities for a single taxon or from map of all localities visited by a collector in a single day, week, or month). 
 • Identification of outliers in other data (e.g., habitat descriptions); any of these outliers could be scientifically interesting, rather than an error. 
Process Activity 
Transcribing • Into appropriate database fields. 
Cataloging • Overlaps broadly with other processes (e.g., transcribing and georeferencing); identified by the production of structured, descriptive metadata. 
Translating • Between a nonnative language and the native language (e.g., between Chinese and English in the United States). 
Georeferencing • Assign latitude and longitude and measures of precision to collection localities not previously described in that way. 
Recording and creating content • Provide location and other information on historical place names used in collection locality descriptions. 
Mapping • Production of maps useful for identifying outliers that might be due to errors or something that is biologically interesting. 
 • Production of maps useful for citizen science research. 
Tagging • Taxonomic identity. 
 • Phenological state or life stage. 
 • Existing disease, herbivory, parasite, etc., at collecting event. 
 • Damage (e.g., from insects) following collecting event. 
 • Entity–quality statements (e.g., the flower is red). 
 • Landmarks for morphometric analysis. 
 • Scientific significance of specimen (e.g., unrecognized type specimen). 
 • Other significance of specimen (e.g., to history). 
 • Digitization process errors (e.g., image file named incorrectly). 
Categorizing • Any of the collaborative tagging activities where the descriptive categories that may be used are constrained. 
Linking • Determine if similar specimens are duplicate collections (exsiccatae; often at different institutions, sometimes with different annotation histories). 
 • Determine if similar records in different data sources are from the same specimen (e.g., from a biodiversity research collections data management system and GenBank). 
Contextualization • Associate specimens with the legacy scientific literature that cites them. 
 • Associate specimens with field collection notebook pages that cite them. 
 • Associate specimens with ongoing scientific research that uses them (e.g., in a citizen science content management system). 
Commenting, crucial responses, and stating preferences • None identified. 
Correcting or modifying content • Any content generated by a collection's staff, public participants, or automation (e.g., optical character recognition or automated georeferencing). 
 • Identification of outliers in appearance (e.g., from images of specimens identified as same taxon). 
 • Identification of outliers in georeferenced data (e.g., from map of all localities for a single taxon or from map of all localities visited by a collector in a single day, week, or month). 
 • Identification of outliers in other data (e.g., habitat descriptions); any of these outliers could be scientifically interesting, rather than an error. 

Note: These activities also align with the classifications of the initial digitization task clusters: predigitization curation and staging, specimen image capture, specimen image processing, electronic data capture, and georeferencing locality descriptions (Nelson et al. 2012). The activities are all assumed to be occurring online from digital content (images or text).

Here, we focus on three broadly defined digitization activities that encompass what we consider to be the core digitization activities in table 1: transcription, georeferencing, and annotation. As we will discuss, these overlap with Nelson and colleagues’ (2012) electronic data capture and georeferencing task clusters and Dunn and Hedges’ (2013) processes of transcribing, cataloging, georeferencing, collaborative tagging (which we shorten to tagging), and categorizing. We recognize that, in most cases, only transcription is a literal capture of data and that georeferencing and annotation often involve more substantial interpretation of specimen data.

A well-designed citizen science project can also contribute to science literacy goals. By providing opportunities to learn about scientific processes, experimental design, focal species, and data analysis (Bonney et al. 2009, Jordan et al. 2011, Whitmer et al. 2010) these projects provide educational benefits that are not possible through outsourcing the digitization to private companies such as Amazon (http://mturk.com) or Crowdflower (http://crowdflower.com). These benefits can be gained in formal classroom settings or in informal settings. The design and supplementary materials for online digitization activities in a classroom setting can emphasize foundational areas in the Next Generation Science Standards (National Research Council 2012), including scientific and engineering practices, crosscutting concepts, and disciplinary core ideas. ZooTeach (http://zooteach.org) is a repository for K–16 educational materials that use Zooniverse's citizen science tools (Masters 2013). Participants in informal and online learning experiences are diverse and include all ages, cultural and socioeconomic backgrounds, abilities, knowledge, and educational backgrounds. Their experiences are characterized as being self-motivated, guided by their own interests, voluntary, personal, embedded in a context, and open-ended (Falk and Dierking 2000, Falk et al. 2001, National Research Council 2009). These experiences provide crucial lifelong learning opportunities to increase science awareness, appreciation, interest, and understanding with different types of digitization programs and activities being able to achieve a variety of learning outcomes.

Despite successful scientific advancements (e.g., Lintott et al. 2008), critics of these approaches cite data quality as a primary concern over the use of citizen science data (Penrose and Call 1995, Nerbonne and Vondracek 2003). In addition, citizen science is not well suited to all facets of scientific applications and workflows (Dickinson et al. 2010, Kremen et al. 2011). Description of data quality has been formalized in the areas of transcription (Hill et al. 2012) and georeferencing (e.g., the National Standard for Spatial Data Accuracy; http://fgdc.gov/standards/projects/FGDC-standards-projects/accuracy/part3/index_html). Training (Dickinson et al. 2010), deliberate program design (Shirk et al. 2012), flexible and multiscale data management systems (Newman et al. 2011), well-chosen data validation protocols (Bonter and Cooper 2012), and rigorous statistical techniques that handle sampling bias and random error (Bird et al. 2014) are known to collectively improve citizen science data quality. Furthermore, it is expected that increased attention on citizen science data quality makes it more likely to be critically evaluated. We expect that these approaches will be valuable as data quality is addressed in this early stage of public participation in digitization of specimens.

Following on this history and these goals, we present each of the three broadly defined areas of digitization (transcription, georeferencing, and annotation), explain and illustrate the activity, identify competencies and training emphases that will lead to the most efficient and accurate results, compare existing tools, identify relevant best practice and standards, and recognize gaps in our knowledge and opportunities for improvement.

Online activity 1: Transcribing specimen label and ledger text

Online activity 1 involves two processes from Dunn and Hedges’ (2013) typology: transcription (creating machine-readable text that reflects the textual content of the specimen label or ledger; sometimes called text encoding) and cataloging (the production of structured, descriptive metadata about the text). We will discuss both of these processes as the activity of transcription, as is common in the biodiversity research collection domain.

Overview

To date, this activity is most commonly completed onsite by paid technicians in one step: typing (or occasionally reading) the text into appropriate fields in their institution's specimen data management system (Nelson et al. 2012). These technicians have been trained to systematically catalog the often complex and variable labels and ledgers found in their home biodiversity research collection. Recently, however, there has been parallel development of tools for the semiautomation of both of the relevant processes and online tools to engage the public in the activities.

The semiautomation of the task separates the two processes: optical character recognition (OCR) can be used for the transcription of typewritten (or printed) text that has been captured in digital images, and applications such as SALIX (Barber et al. 2013), LABELX (Heidorn and Zhang 2013), and those developed by Apiary (apiaryproject.org) can automatically produce structured data from the OCR text strings. A human typically then proofreads the output of these automated workflows. OCR has a low accuracy rate with handwritten text and even in some cases of typewritten text (e.g., faded ink; Barber et al. 2013). This reduces the value of automation in the transcription process and, therefore, also the cataloging process that follows it, but it is worth noting that LABELX uses a fuzzy matching algorithm to accommodate OCR errors to some success (Heidorn and Zhang 2013). In the case of at least the Arizona State University Herbarium, the use of a semiautomated workflow involving OCR and SALIX led to a higher overall transcription or cataloging rate for a mix of specimen labels (in terms of expected OCR performance) than did typing the transcriptions into database fields from a similar mix of specimen labels (26.3 records per hour versus 20.4 records per hour by experts, respectively; Barber et al. 2013). Clearly, semiautomation of this task merits further development. However, we note that anecdotal evidence reported by Barber and colleagues (2013) suggested that typing the transcriptions into database fields is the more efficient workflow when the specimen labels are short and OCR performance was in the low part of the observed distribution for that particular herbarium (Barber et al. 2013). The relative collection-wide efficiency of typing versus semiautomated workflows might be different for collections with mostly short specimen labels and text features that lead to low OCR performance (e.g., insect collections in which the text is often imaged at an oblique angle and the labels are often stacked on one another). Both online transcription and annotation (online activity 3) require digital images—of all relevant labels or ledgers in the case of the former and the specimen in the case of the latter.

In our experience, public participants can be expected to be most efficient and accurate at the transcription activity when they are proficient typists and can read the language in which the label was written. Personal attributes that also benefit any of these digitization activities include attention to detail, patience, dedication, and a desire to make a difference or contribution. Useful emphases in training for the task can be placed on skills relevant to the basic understanding of specimen labels such as interpreting common scientific jargon, abbreviations, label formats, and variability in dates (ordering of month–day versus day–month in different cultures), as well as standard markup for capturing annotations, deletions, and markings in the original text. Equally important is training in how to handle label information that requires further judgment such as when to type the element verbatim and when some interpretation may be used (e.g., when common words are misspelled), how to handle inconsistencies (e.g., when the city given is not found in the state given or country names that have changed over time), and identifying targeted data elements and selecting the appropriate element when multiple similar elements exist (e.g., from among the scientific names on the original label and later annotation labels). A set of specimen labels or ledger entries can vary substantially in legibility, information content, and consistency, and training examples need to adequately represent that variation.

There are relatively many online tools that engage the public in transcription of this type. Some of the better-known tools include the Atlas of Living Australia's DigiVol (http://volunteer.ala.org.au), Zooniverse's Notes from Nature (notesfromnature.org), Herbaria@Home (http://herbaria­united.org/atHome), Smithsonian Digital Volunteers (http://transcription.si.edu), Les Herbonautes from the National Herbarium in France (http://herbonautes.mnhn.fr), and Discover Life's Time Machine (http://discoverlife.org/timemachine). These are similar in that there is a specimen label or ledger image viewer in each and a subset of shared data elements (e.g., taxonomic identification, collection date) that are targeted ­(figure 1). These differ most significantly in whether the tasks are packaged into subprojects (called expeditions in DigiVol), incentives for participation, the ability to discuss tasks with reference to individual specimens, the means by which the area of interest is shown in the image, the number of entry fields displayed on the page at a time, and the validation of the entries (transcription by one user then validation by another versus multiple transcriptions that are later reconciled; table 2).

Figure 1.

Example transcription interfaces. The Atlas of Living Australia's DigiVol has an interface that permits zooming and panning of the image at any time; all targeted fields for digitization are displayed at once. Zooniverse's Notes from Nature has an interface that requires the user to choose a portion of the image for zooming, and that portion remains static through the transcription. In Notes from Nature, one field is requested at a time, but users can return to earlier fields.

Figure 1.

Example transcription interfaces. The Atlas of Living Australia's DigiVol has an interface that permits zooming and panning of the image at any time; all targeted fields for digitization are displayed at once. Zooniverse's Notes from Nature has an interface that requires the user to choose a portion of the image for zooming, and that portion remains static through the transcription. In Notes from Nature, one field is requested at a time, but users can return to earlier fields.

Table 2.

Online tools for public participation in transcription of biodiversity specimen labels and field notebooks. Characteristics of each are described as applicable according to the given category. Values are valid as of February 2015, unless otherwise noted.

Transcription tool Taxonomic; geographic; and object type focus Training Incentives Contributors Transcriptions Interface Validation process 
Atlas of Living Australia's DigiVol Life; global, but especially Australia; specimens and field notebooks. Onsite tutorials and forum. Recognition of every individual's contributions to each expedition, as well as those making the greatest contribution. 860 130,816 Zoom and pan in window or in separate window; all fields seen at once. Each task has one transcription and one validation (proofread by an experienced transcriber). 
Zooniverse's Notes from Nature Life; global, but especially United States; specimen labels and field notebooks Onsite instructions and forums. Badges earned upon completion of a certain number of transcriptions. 6,833 1,042,592 Drag box around label, label appears in window; one field shown at a time. Four participants enter data for each specimen with postprocessing of these. 
Herbaria@ Home Plants; United Kingdom; specimen labels Onsite instructions and videos. None. 420 146,834 Zoom in on label. All fields seen at once, plant name provided, other field values provided by ­pull-down menu. ∼1% of records are cross-checked by additional participants. Data users can also make edits. 
Smithsonian Digital Volunteers Life; global, from within Smithsonian collections; specimen labels and field notebooks. Onsite tutorials and tips. None. 1,163 26,520 (as of April 2014) Zoom and pan in window, fields divided into several windows (labels) or in one field (notebooks). Participants review completed pages. 
Les Herbonautes (National Herbarium in France) Plants and algae; global but especially Europe; specimen labels. Onsite guidelines. Participants start with simple transcription fields (e.g., country), and are tested before progressing to more challenging fields. Badges earned on completion of a certain number of transcriptions. 1,859 1,292,722 (contributions of individual values) Zoom in window; all fields seen at once. Validation of individual fields by other participants, usually 2, until consensus is reached. 
Discover Life Plants and insects; global; specimen labels Onsite guidelines and help. None. 64 3,489 Zoom in window; all fields seen at once. No validation 
Transcription tool Taxonomic; geographic; and object type focus Training Incentives Contributors Transcriptions Interface Validation process 
Atlas of Living Australia's DigiVol Life; global, but especially Australia; specimens and field notebooks. Onsite tutorials and forum. Recognition of every individual's contributions to each expedition, as well as those making the greatest contribution. 860 130,816 Zoom and pan in window or in separate window; all fields seen at once. Each task has one transcription and one validation (proofread by an experienced transcriber). 
Zooniverse's Notes from Nature Life; global, but especially United States; specimen labels and field notebooks Onsite instructions and forums. Badges earned upon completion of a certain number of transcriptions. 6,833 1,042,592 Drag box around label, label appears in window; one field shown at a time. Four participants enter data for each specimen with postprocessing of these. 
Herbaria@ Home Plants; United Kingdom; specimen labels Onsite instructions and videos. None. 420 146,834 Zoom in on label. All fields seen at once, plant name provided, other field values provided by ­pull-down menu. ∼1% of records are cross-checked by additional participants. Data users can also make edits. 
Smithsonian Digital Volunteers Life; global, from within Smithsonian collections; specimen labels and field notebooks. Onsite tutorials and tips. None. 1,163 26,520 (as of April 2014) Zoom and pan in window, fields divided into several windows (labels) or in one field (notebooks). Participants review completed pages. 
Les Herbonautes (National Herbarium in France) Plants and algae; global but especially Europe; specimen labels. Onsite guidelines. Participants start with simple transcription fields (e.g., country), and are tested before progressing to more challenging fields. Badges earned on completion of a certain number of transcriptions. 1,859 1,292,722 (contributions of individual values) Zoom in window; all fields seen at once. Validation of individual fields by other participants, usually 2, until consensus is reached. 
Discover Life Plants and insects; global; specimen labels Onsite guidelines and help. None. 64 3,489 Zoom in window; all fields seen at once. No validation 

Best practices and standards

To our knowledge, there are not best practice documents specifically targeted at engagement of the public in transcription for biodiversity research collections. However, there are best practices for specimen imaging that must occur to permit online transcription and annotation (Häuser et al. 2005; http://sciweb.nybg.org/Science2/hcol/mtsc/NYBG_Best_Practices.doc), and there are best practices that are generally relevant to the digitization activities identified in table 1, such as DataONE's Primer on Data Management (http://dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf) and the online Citizen Science Central Toolkit (http://birds.cornell.edu/citscitoolkit/toolkit/steps). On the basis of the experience of three of us developing two of the transcription tools (DigiVol and Notes from Nature), we suggest several considerations related to the online tool, its interface, and the most efficient ways a participant can engage with it (box 1). Many of these recommendations also have clear relevance to the georeferencing and annotating activities that we discuss below.

Box 1. Our recommendations for online transcription tools.

The image display should produce a clear view of all relevant text at an appropriate zoom level at once or via panning.

Data entry fields should be accessible whilst viewing the image.

Drop-down lists should be provided when the universe of acceptable responses can be populated from controlled vocabularies and is relatively small (e.g., the 50 US states); autocomplete functionality in free text fields should be provided when the number of acceptable responses is larger and cannot be fully populated from the beginning of the project (e.g., collector names).

Dependencies in the acceptable values for fields should be built in (e.g., only those counties from the state of Georgia are available in a dropdown once the state is established as Georgia).

Readily accessible examples and directions for each field should be available during the activity.

Forums to enable volunteers to ask questions about specific specimens or ledgers or the general process of transcription to the project manager and each other should be provided.

A task completion count should provide the public participant with both progress towards the project's digitization goal and the ­participant's overall contributions to the project.

Scientists should regularly communicate to the public the value of the generated data (e.g., through a blog and social media); developers should regularly communicate new developments and bug fixes.

Response and loading time of images and transcription pages should be quick.

Permit transcribers to explore the portion of the image containing the organism or view an image of the taxon from another source (e.g., Notes from Nature's Macrofungi Interface displays images of the taxon from Encyclopedia of Life).

Relevant sources of standards for this activity and, to some extent, the other two include the Dublin Core Metadata Initiative (http://dublincore.org), the Darwin Core for biodiversity information (http://rs.tdwg.org/dwc; Wieczorek et al. 2012), the Audubon Core for metadata about multimedia files associated with biodiversity research collections and resources (http://tdwg.org/standards/638), and the Ecological Metadata Language project (http://knb.ecoinformatics.org). Specific to markup text in the humanities is XML-TEI markup (http://tei-c.org/index.xml), which is important in the context of transcribing ledgers.

Gaps in our knowledge and areas for improvement

Despite recent recommendations from the Notes from Nature project (Hill et al. 2012) and limited research into motivations of citizen scientists (Rotman et al. 2014), we still lack a satisfactory understanding of several aspects of public participation in transcribing biodiversity specimen labels and ledgers. These include the most significant factors affecting efficiency, accuracy, initial motivation, and long-term engagement; the best algorithms to produce consensus transcription from multiple replicates; and the most effective data validation methods. Each of these also has clear relevance to the georeferencing and annotating activities. Improvements to transcription tools could enhance participant enjoyment and ease of use. For example, new functionality could give the contributor more control of their transcription experience, such as providing them with the ability to establish the criteria used to determine the specimens that they transcribe (e.g., on the basis of the collection supplying the specimen images or the occurrence of a word in the OCR text strings generated from images) or the ability to toggle between interfaces that show a single field at a time and multiple fields at a time. Furthermore, records could be sorted for transcription based on similarity (e.g., overall similarity of OCR text strings). Improvements could also address data quality issues by providing the ability for participants to return to earlier transcription records to correct what they later learn are transcription errors. The biodiversity research collections community would also benefit from greater sharing of best practices and tools with the digital humanities community, in which projects, such as the University College London's Transcribing Bentham Project (http://blogs.ucl.ac.uk/transcribe-bentham), the University of Iowa's Civil War Diaries and Letters Transcription Project (http://digital.lib.uiowa.edu/cwd), and the Medici Archives Project (http://medici.org), and standalone tools such as Ben Brumfield's FromThePage (http://beta.fromthepage.com) for transcription and Juxta (http://juxtasoftware.org) for the comparison of multiple transcriptions of a single text, represent significant overlap in objectives between the two communities.

Online activity 2: Georeferencing

Georeferencing, as applied to biodiversity research collections, is the inference of a geospatial geometry from the textual collection locality description on a label or in a ledger (figure 2; Guralnick et al. 2006).

Figure 2.

Example of georeferencing results by volunteers for a single locality. In this example, nine minimally trained undergraduate students georeferenced this herbarium specimen label. Two of these were outside of the bounds of the national forest and were removed as obvious outliers. The seven remaining points are represented by the green dots. A mathematical mean of these points is shown with a red dot. A local expert familiar with Apalachicola National Forest (where this specimen was collected) georeferenced the label as represented by the yellow dot.

Figure 2.

Example of georeferencing results by volunteers for a single locality. In this example, nine minimally trained undergraduate students georeferenced this herbarium specimen label. Two of these were outside of the bounds of the national forest and were removed as obvious outliers. The seven remaining points are represented by the green dots. A mathematical mean of these points is shown with a red dot. A local expert familiar with Apalachicola National Forest (where this specimen was collected) georeferenced the label as represented by the yellow dot.

Overview

The geospatial geometry is often expressed as a single point representing latitude and longitude, usually with an associated radius allowing representation of uncertainty (Wieczorek et al. 2004). However, localities could also be represented as multipoints, lines, multilines, polygons, and multipolygons to better reflect either the collection method or imprecision associated with the interpretation of a textual collection locality description. For example, sampling transects may be recorded as a line with start and stop coordinates, as is common in samples from trawlers. The expression of uncertainty is crucial to determining a data record's fitness for use (Wieczorek et al. 2004). For example, point data with an uncertainty of 10 km may be unsuitable for an analysis across 1-km-resolution environmental gradients. Georeferences as latitude and longitude coordinates and the datum on which the coordinates are based are typically lacking from terrestrial and inland aquatic specimens collected before the 1990s (Beaman and Conn 2003; marine specimens might differ). Where those are available, they can provide useful validation for textual descriptions or vice versa, because such latitude and longitude readings also have associated, and often unreported, uncertainties.

Public participants can be expected to be most efficient and accurate at georeferencing when they can read the language in which the label was written, can read relevant map types (e.g., topographic or nautical), and have some familiarity with the area in which the specimen was collected (i.e., experience on the ground or with locally used names). Useful emphases in training for the task can be placed on basic geographical skills such as identifying the locality information and interpreting locality types, interpreting geographic jargon, compass bearings, abbreviations, and formats, and understanding the common types of geographic projections (e.g., equal area), coordinate systems (e.g., Universal Transverse Mercator) and geodetic systems (e.g., World Geodetic System 1984). Training will also improve a participant's ability to interpret locality descriptions and uncertainties. For these skills, training emphases can be placed on finding and using relevant maps and indices of place names, and precisely describing the georeferencing method in a standard way, using known sampling biases to interpret locality descriptions (e.g., the tendency to collect near existing roads), and describing uncertainty quantitatively (e.g., as the radius of a circle) or using other geometries (e.g., a polygon). An understanding of the historical context and relevant training in interpreting the ­patterns in historical aerial photographs that are relevant to predicting the community type at alternative locations (e.g., swamp versus upland) is also helpful. The extent to which the training is needed will vary depending on the locality descriptions. For example, the description “Pushepatapa Creek, 7.8 miles north of Bogalusa at Hwy 21; Washington Parish; Louisiana” requires very little expertise to pinpoint, because it is at the intersection of a bridge and a creek. However, the description “San Francisco Bay, Shag Rock, S. 58° W, Rt. Tang. Pt. Avisadero, S. 74° W., Goat Island. Lighthouse, N. 21°W.; United States” requires an understanding of compass bearings and reading navigational charts.

We are aware of a single online tool that has been used to engage the public in the georeferencing of specimens, although many other examples of what has been called “volunteered geographic information” (Goodchild 2007, Elwood et al. 2011) exist. GEOLocate provides users with, among other functions, a Collaborative Georeferencing Web Client (http://museum.tulane.edu/geolocate/community/default.html)—a framework for managing a community of georeferencers and a tool that automatically interprets textual locality information and supplies candidate points and associated uncertainties (radii and polygons). A georeferencing volunteer or technician can evaluate the automated results against various online base maps (e.g., aerial photography) to select the most appropriate point and uncertainty description or make modifications as necessary. Occasionally, additional resources such as Google Earth (http://earth.google.com), historical paper maps, web searches, original field notes, detailed specimen records, ship logs, cemetery records, and so on are required to accurately determine the location of collection, and these resources can be recorded by the georeferencer along with data quality issues. Although it is not wholly volunteer-based, the FishNet Project (http://fishnet2.net; an online archive of fish collection holdings around the world), illustrates an implementation of GEOLocate for volunteer georeferencing. In that project, the georeferencing of 3.7 million lots is distributed among staff technicians and occasional volunteers at 12 institutions around the United States with georeferencing responsibilities parsed by the geographic origin of the specimen (e.g., Africa), rather than the collection that curates the specimen.

Best practices and standards

Best practice documents specific to georeferencing specimens include Guide to Best Practices for Georeferencing (Chapman et al. 2006), Principles and Methods of Data Cleaning—Primary Species and Species-Occurrence Data (Chapman 2005), and Guide to Best Practices for Generalising Sensitive Species Occurrence Data (Chapman and Grafton 2008). However, the geospatial community has produced many other best practice documents, including those related to standards (e.g., as at the Open Geospatial Consortium; http://opengeospatial.org/standards/bp) and commercial or open-source geographic information systems (e.g., as found at ESRI; http://esri.com). A useful clearinghouse for information about the process of georeferencing specimens is provided by VertNet (http://vertnet.org) at http://georeferencing.org.

We are unaware of best practice documents produced to address public participation in the generation of geospatial data. However, on the basis of the experience of developing GEOLocate and implementing tools in projects such as VertNet (http://vertnet.org), we suggest several considerations that are important to successfully engage the public in this activity. The categorization of data records into administrative unit of specimen origin (e.g., country, state, county) is useful for assigning records to public participants; a user survey can provide information regarding on-the-ground knowledge for alignment with the specimen localities. Classification of georeferencing difficulty (using, e.g., the uncertainty that GEOLocate automatically assigns) is useful for assigning records as well; a participant's performance with control localities (where accurate coordinates are known) can be used to evaluate georeferencing skill. Each locality record should be georeferenced multiple times until the points reach some clustering threshold (a predefined spatial variance) or the replicates reach a limit, at which the record is flagged for the attention of an expert. Recommendations made for transcription best practices are also relevant here, especially provision of a forum for users to discuss specific localities or general patterns with each other and project scientists, leading to greater user proficiency and understanding.

Relevant sources of standards for the generation and communication of geospatial data include the Federal Geographic Data Committee (http://fgdc.gov), the Open Geospatial Consortium (http://opengeospatial.org), and within Darwin Core (i.e., DC-location), as well as most of those presented for transcription.

Gaps in our knowledge and areas for improvement

We do not have a satisfactory understanding of several aspects of public participation in georeferencing, including the average number of replicate georeferencing events needed to reach a sufficient level of accuracy and effective methods for balancing accuracy and precision (e.g., by removal of outliers) to produce a useful consensus georeference. Still lacking are the ability to match georeferencing competencies with collection localities and sufficient strategies for assessing a user's georeferencing competencies initially and through time. A better understanding of how to enable collaboration and communication (e.g., by visualizing on a map the collection localities being discussed in a forum) is also needed.

Digital imaging and linking of field notes to specimens would likely provide a big benefit to georeferencing, because field notes can contain a wealth of information about collecting sites, including travel itineraries, site sketches, environmental information, and other remarks not often found on specimen labels. iDigBio's 2014 Digitizing from Source Materials Workshop (http://idigbio.org/wiki/index.php/Digitizing-From-Source-Materials) laid the groundwork for this link. The biodiversity research collections community would also benefit from greater sharing of best practices and tools with other communities, including the ecological citizen science projects that enable mapping of species observations (e.g., National Geographic's FieldScope project, http://education.nationalgeographic.com/education/program/fieldscope, and iNaturalist, http://inaturalist.org), digital humanities projects that rectify digital images of historical maps (e.g., Map Georeferencer, http://maps.nls.uk/projects/georeferencer/about.html, which has been used in the British Library Georeferencer Project, http://bl.uk/maps), and projects to develop “framework data” (sensu Elwood et al. 2012; e.g., OpenStreetMap, http://openstreetmap.org).

Online activity 3: Annotating

Beyond the label data used for the transcribing activity (online activity 1), a wealth of additional information can be derived from the image of the specimen and shared through annotations. Annotating can be variously characterized in Dunn and Hedges’ (2013) classification as tagging, categorizing, and cataloging, depending on constraints imposed on the activity (tagging versus categorizing) and the degree of structure in the metadata generated (more required for cataloging).

Overview

Physical annotations traditionally were associated with (e.g., pasted on or placed in the same jar as) a physical specimen that was visited at its home collection or examined while on loan to another collection. In online specimen annotation, a feature of interest can be described and measured from a digital image, often with an area of interest specified, linking the annotation not only to a specimen, but a region on the specimen image (figure 3). Annotations can be related to taxonomic identity, phenological state or life stage, features in existence at the time of the collecting event (e.g., evidence of disease or herbivory), damage following the collecting event (e.g., from pests), entity–quality statements (e.g., the flower is red), landmarks for morphometric analysis, and so on (see table 1 for further detail). Annotations are not typically a focus of the initial specimen digitization (e.g., those task clusters described by Nelson et al. 2012) unless they are legacy physical annotations associated with the specimen at the time of digitization, but they can be fundamental to the downstream research applicability of specimens. For example, DNA barcoding can require voucher specimens with minimal damage (Jinbo et al. 2011) and the ability to search on legacy and digital annotations related to damage would increase efficiency. Researchers using public annotation applications would conceivably receive much richer information based on carefully constructed guided-choice questions, example images for comparison, and associated ontology classes creating properly formed entity–quality syntax statements (Gkoutos et al. 2004) from the users choices.

Figure 3.

Example of a highly structured image annotation. Image of Ampulex compressa (F.) from the Museum für Naturkunde Berlin (http://morphbank.net/?id=102143), illustrates a few possible hypothetical public image annotation types, which engage the public actively in augmented image annotation for phenotype research. (A) A participant may be asked to draw a box around any damage to the specimen, which would get reported to a researcher as a positive response to this area of interest. (B) A guided question, “Outline the wings of the specimen,” which can be converted to a shape file of the wing. (C) A description of a certain quality. A participant would see the generic black-and-white image with a body part highlighted, along with a specimen, and be asked to describe that body part on the specimen itself—that is, “What color is this part of the body?” In this case, documentation of a red femur would be recorded. Abbreviation: mm, millimeter.

Figure 3.

Example of a highly structured image annotation. Image of Ampulex compressa (F.) from the Museum für Naturkunde Berlin (http://morphbank.net/?id=102143), illustrates a few possible hypothetical public image annotation types, which engage the public actively in augmented image annotation for phenotype research. (A) A participant may be asked to draw a box around any damage to the specimen, which would get reported to a researcher as a positive response to this area of interest. (B) A guided question, “Outline the wings of the specimen,” which can be converted to a shape file of the wing. (C) A description of a certain quality. A participant would see the generic black-and-white image with a body part highlighted, along with a specimen, and be asked to describe that body part on the specimen itself—that is, “What color is this part of the body?” In this case, documentation of a red femur would be recorded. Abbreviation: mm, millimeter.

Augmenting specimen information with useful conclusions from the specimen image encompasses a variety of strategies and techniques that can include both automation and public participation. For example, various research projects are exploring methods for automated taxonomic identification. Similar to facial recognition applications used to identify people, these methods require an accurate training data set of identified images from one or more standard angles. Examples of automation of this type include Leafsnap (Kumar et al. 2012) for identification of 184 tree species in the Northeastern United States and SPIDA (Russell et al. 2007) for one family of Australasian ground spiders. It is unclear whether—or how well—this implementation of computer vision will scale up to larger geographic areas and taxonomic groups in the future, and we note Russell and colleagues’ (2007) conclusion (p. 149): “Automating the identification of specimens to species is a difficult task. There is no reason to believe that teaching a computer to identify species will be any easier than teaching a person to do so. In fact, it is likely a trickier process altogether, considering the amazing ability of the human mind to compensate for missing information and recognize the similarity in objects.” Nevertheless, public participants could be important in the development of this process by building the training data sets for these automation methods as those algorithms become more successful. We are unaware of automation of the other types of annotation (e.g., finding image edges that could allow automated ways to measure typical trait values like body length and width).

Public participants can be expected to be most efficient and accurate at annotation when they have existing familiarity with the focal taxonomic group or the focal taxonomic group within a focal geographic region (e.g., ­millipedes of Arkansas), the use of authoritative resources (e.g., taxonomic keys and illustrated glossaries), and the use of relevant terms (e.g., leaves and glaucous). Useful emphases in taxa-specific training can be placed on recognizing relevant features of the focal taxonomic group, correct usage of relevant terms, use of specific resources (e.g., a key to the millipedes of Arkansas) and the protocol for describing relevant resources and methods used for reaching the conclusion of an annotation. Process- and image-specific training can include identifying typical changes that can occur in the phenotype after preservation as a specimen (e.g., common color changes or pest damage patterns) and typical distortions introduced by an imaging technique (e.g., deviations from a rectilinear projection or chromatic aberrations).

Relatively many online applications enable public participation in the annotation of images (although not necessarily specimen images) in a constrained way (mostly falling within Dunn and Hedges’ 2013 categorizing process). For example, Citizen Sort (http://citizensort.org) has online games such as Happy Match (http://citizensort.org/web.php/happymatch), in which users categorize organisms in images. Crowdcrafting (http://crowdcrafting.org) is an open-source platform that enables image classification projects (as well as transcription and georeferencing projects), such as The Faces We Make (http://crowdcrafting.org/app/thefacewemake), in which users associate expressions on faces with emoticons for social science research. Zooniverse (http://zooniverse.org) has several image annotation projects, including those with biological applications, such as in the Seafloor Explorer project (http://seafloorexplorer.org) and Condor Watch (http://condorwatch.org). The biological image repository Morphank (http://morphbank.net) gives users the ability to annotate images with taxonomic and morphological observations to produce highly structured metadata that become searchable at that site. Citizen science communities building observational data sets (which might not be vouchered with a specimen) also perform taxonomic annotations with tools at iNaturalist (http://inaturalist.org), BugGuide (http://bugguide.net), and Mushroom Observer (http://mushroomobserver.org).

Relevant best practice and standards

We are unaware of best practice documents that address public participation in annotations of digital specimen images. However, best practice documents related to the creation and management of somewhat analogous annotations of images do exist in the digital humanities at Europeana Connect (http://europeanaconnect.eu; e.g., as it relates to map annotations), and there is a best practice document for specimen imaging (Häuser et al. 2005), which can become especially important when features of the specimen are interpreted from the digital image (Zelditch et al. 2012). On the basis of the experience of four of us in developing Morphbank's image annotation tool, we suggest several considerations to successfully engage the public in this activity. Imaging techniques should take into account annotation when it is planned or can be anticipated (e.g., many beetles are only identifiable by the number of segments on the tarsus and without that part in the image, an annotation of taxonomic identity is difficult). Also, users should have easy access to tools for zooming and panning and designating an area of interest in the image to associate with the annotation. Finally, constraint of annotation terms to those in controlled vocabularies (e.g., from ontologies or taxonomic authority files) can enable semantic processing and reduces spelling errors. Recommendations made above in reference to transcription and georeferencing best practices are also relevant here, especially provision of a forum for the users to discuss annotations with each other and project scientists, leading to greater user proficiency and understanding.

Standards relevant to annotation specifically include the relevant taxonomic codes (McNeill et al. 2012, Tschöpe et al. 2013), the Apple Core extension of the Darwin Core (for sharing botanical annotations, http://code.google.com/p/applecore), and various controlled vocabularies that have the potential to greatly extend the value of annotations for discovery (Deans et al. 2012). Potentially relevant controlled vocabularies include Phenotypic Quality Ontology (previously known as Phenotype and Trait Ontology; Mungall et al. 2010) for phenotypic characteristics including color and shape, Environment Ontology (http://environmentontology.org; Hirschman et al. 2008) for environmental and habitat descriptions, and general anatomical ontologies for various morphological details (e.g., Hymenoptera Anatomy Ontology; Yoder et al. 2010).

Gaps in our knowledge and areas for improvement

We do not have a satisfactory understanding of several aspects of public participation in annotation including the interface design that is most suitable for capturing complex data while maintaining participants’ interest and furthering science literacy goals, the accuracy rate for different forms of annotation (e.g., taxonomic identification or determination of phenological state), and the most successful methods of quality control for variable citizen science contributions.

The annotation activity can potentially be improved by providing more advanced image viewing tools in the public participation sites, such as side-by-side image comparisons and transparency overlays that allow direct comparison of one image on top of another (e.g., two leaf images), more complete annotation metadata that records such information as the zoom-level and frame viewed at the time of annotation, and greater flexibility in the designation of an area of interest (e.g., using multiple polygons or edge detection or selection tools).

Conclusions

Data about a huge number of biodiversity research specimens (perhaps as many as 900 million in the United States) are still stuck in cabinets—not yet represented online in digital form. We see engagement of the public in that digitization as one strategy to accelerate data capture for urgent social challenges, such as predicting biotic responses to climate change and invasive species. Here, we reviewed the state of public participation in three major areas of digitization—transcription, georeferencing, and annotation. Each of these activities contributes crucial data to research and offers educational opportunities, but public participation in transcription is most advanced of the three. This is perhaps due to efficiencies that can be introduced into the latter two activities once the specimen's identity and collection locality description have been digitized.

Across the three major digitization tasks, several common needs for improvement can be noted. We recognize seven high priority steps for the community to take in this area: (1) All of the public participation tools for biodiversity specimen digitization that we have discussed are relatively new, and experimental data on optimal user interface configurations (e.g., to maximize such things as efficiency or accuracy or user enjoyment) are almost totally lacking. New tool developments in this area should be driven by experiments, user surveys, and participatory design principles. (2) Experimentation is also merited in the area of quality control of the data. For example, which transcription method leads to greatest accuracy and efficiency (Brumfield 2012)? And which are the most useful crowd consensus benchmarks (Sheshadri and Lease 2013) in these activities? (3) Although the need for tools to engage the public in various forms of transcription and annotation is certainly not fully met, and there is plenty of space for improvement, restriction of the georeferencing activity to a single tool with somewhat nascent public participation functionality (GEOLocate's Collaborative Georeferencing Web client) suggests that developments in that area are crucial. (4) Motivation for initial and sustained user engagement is an active area of research but is still poorly understood and would benefit from more widespread use of user surveys. (5) We view the relative paucity of education and outreach materials to complement public engagement in digitization to represent an area for considerable growth. For example, ZooTeach offers over 49 lesson plans using Zooniverse science tools, but none of them involve Notes from Nature. (6) Existing best practice and standards documents do not sufficiently overlap the needs of this area of public participation. We are optimistic that the reinvigorated Biodiversity Informatics Standards (TDWG) Citizen Science Working Group (as of the 2013 TDWG meeting) and the iDigBio Public Participation in the Digitization of Biodiversity Specimens Working Group will provide leadership in this area. (7) Interoperability between digitization tools that engage the public and specimen data management systems (e.g., Specify and Symbiota) should be expanded beyond the small number of cases. In one of the few examples, FilteredPush (http://wiki.filteredpush.org) can send annotations made in Morphbank to a collection curator's Specify or Symbiota specimen data management system. iDigBio and Notes from Nature hosted a workshop on this topic in fall 2014.

Finally, the development of a public digitization project relies on somewhat ad hoc negotiations between a collection curator and the managers of relevant public participation tools who each require different information in different formats. This can slow progress and is an area where standardization has the potential to make the creation and management of public digitization projects accessible to not just all collections curators but also members of the public (e.g., a local chapter of a native plant society). Empowering the latter group has the potential to engage far more participants by better aligning the digitization projects that are available with the motivations of the public, making the projects collaborative or cocreated, rather than simply contributory (sensu Shirk et al. 2012). By contrast, opportunities for public engagement today are largely contingent on decisions made by collections curators and tool managers. We are encouraged by projects such as iDigBio's Biospex Public Participation Management System (http://biospex.org) in this area.

The authors thank the following people for useful conversations on this topic: Melody Basham, Jim Beach, Jason Best, Cathy Bester, David Bonter, Ben Brumfield, Michael Denslow, Renato Figueiredo, Jose Fortes, Charlotte Germain-Aubrey, Michael Giddens, Ed Gilbert, Jonathan Hendricks, Austin Hendy, Andrew Hill, Kevin Love, Bruce MacFadden, Elizabeth Martin, Andrea Matsunaga, Tom Nash, Larry Page, Richard Primack, Pam Soltis, Julie Speelman, Patrick Sweeney, Barbara Thiers, Alex Thompson, Bill Watson, Andrea Wiggins, Nathan Wilson, Alison Young, and Jessica Zelt. iDigBio is funded by a grant from the National Science Foundation's Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement no. EF-1115210). The National Ecological Observatory Network (NEON) is a project sponsored by the National Science Foundation and managed under cooperative agreement by NEON. This material is based on work supported by the National Science Foundation under Cooperative Agreement no. EF-1029808. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References cited

Barber
A
Lafferty
D
Landrum
LR
The SALIX Method: A semi-automated workflow for herbarium specimen digitization
Taxon
 , 
2013
, vol. 
62
 (pg. 
581
-
590
)
Beach
J
, et al.  . 
A Strategic Plan for Establishing a Network Integrated Biocollections Alliance
iDigBio
 , 
2010
 
Beaman
R
Conn
B
Automated geoparsing and georeferencing of Malesian collection locality data
Telopea
 , 
2003
, vol. 
10
 (pg. 
43
-
52
)
Benz
S
Miller-Rushing
A
Domroese
M
Ballard
H
Bonney
R
DeFalco
T
Newman
S
Shirk
J
Young
A
Workshop 1: Conference on Public Participation in Scientific Research 2012: An international, interdisciplinary conference
Bulletin of the Ecological Society of America
 , 
2013
, vol. 
94
 (pg. 
112
-
117
)
Bird
TJ
, et al.  . 
Statistical solutions for error and bias in global citizen science datasets
Biological Conservation
 , 
2014
, vol. 
173
 (pg. 
144
-
154
)
Bonney
R
Cooper
CB
Dickinson
J
Kelling
S
Phillips
T
Rosenberg
KV
Shirk
J
Citizen science: A developing tool for expanding science knowledge and scientific literacy
BioScience
 , 
2009
, vol. 
59
 (pg. 
977
-
984
)
Bonney
R
Shirk
JL
Phillips
TB
Wiggins
A
Ballard
HL
Miller-Rushing
AJ
Parrish
JK
Next steps for citizen science
Science
 , 
2014
, vol. 
343
 (pg. 
1436
-
1437
)
Bonter
DN
Cooper
CB
Data validation in citizen science: A case study from Project FeederWatch
Frontiers in Ecology and the Environment
 , 
2012
, vol. 
10
 (pg. 
305
-
307
)
Brumfield
B.
Quality control for crowdsourced transcription. In Brumfield B, ed. Collaborative Manuscript Transcription
BlogSpot
 , 
2012
 
Chapman
AD.
Principles and Methods of Data Cleaning: Primary Species and Species-Occurrence Data
Global Biodiversity Information Facility
 , 
2005
Chapman
A
Grafton
O
Guide to Best Practices for Generalising Sensitive Species Occurrence Data
Global Biodiversity Information Facility
 , 
2008
Chapman
AD
Wieczorek
J
BioGeomancer Consortium
Guide to Best Practices for Georeferencing
Global Biodiversity Information Facility
 , 
2006
Cook
JA
, et al.  . 
Aiming up: Natural history collections as emerging resources for innovative undergraduate education in biology
BioScience
 , 
2014
, vol. 
64
 (pg. 
725
-
734
)
Deans
AR
Yoder
MJ
Balhoff
JP
Time to change how we describe biodiversity
Trends in Ecology and Evolution
 , 
2012
, vol. 
27
 (pg. 
78
-
84
)
Dickinson
JL
Zuckerberg
B
Bonter
DN
Citizen science as an ecological research tool: Challenges and benefits
Annual Review of Ecology, Evolution, and Systematics
 , 
2010
, vol. 
41
 (pg. 
149
-
172
)
Dunn
S
Hedges
M
Crowd-sourcing as a component of humanities research infrastructures
International Journal of Humanities and Arts Computing
 , 
2013
, vol. 
7
 (pg. 
147
-
169
)
Ehrlich
PR
Pringle
RM
Where does biodiversity go from here? A grim business-as-usual forecast and a hopeful portfolio of partial solutions
Proceedings of the National Academy of Sciences
 , 
2008
, vol. 
105
 (pg. 
11579
-
11586
)
Elwood
S
Goodchild
MF
Sui
DZ
Researching volunteered geographic information: Spatial data, geographic research, and new social practice
Annals of the Association of American Geographers
 , 
2011
, vol. 
102
 (pg. 
571
-
590
)
Erb
LP
Ray
C
Guralnick
R
On the generality of a climate-mediated shift in the distribution of the American pika (Ochotona princeps)
Ecology
 , 
2011
, vol. 
92
 (pg. 
1730
-
1735
)
Everill
PH
Primack
RB
Ellwood
ER
Melaas
EK
Determining past leaf-out times of New England's deciduous forests from herbarium specimens
American Journal of Botany
 , 
2014
, vol. 
101
 (pg. 
1293
-
1300
doi:10.3732/ajb.1400045
Falk
JH
Dierking
LD
Learning from Museums: Visitor Experiences and the Making of Meaning
 , 
2000
AltaMira Press
Falk
JH
Donovan
E
Woods
R
Free-Choice Science Education: How We Learn Science Outside of School
 , 
2001
Teachers College Press
Gkoutos
GV
Green
EC
Mallon
AM
Hancock
JM
Davidson
D
Using ontologies to describe mouse phenotypes
Genome Biology
 , 
2004
, vol. 
6
  
(art. R8)
Goodchild
M.
Citizens as sensors: The world of volunteered geography
GeoJournal
 , 
2007
, vol. 
69
 (pg. 
211
-
221
)
Guralnick
RP
Wieczorek
J
Beaman
R
Hijmans
RJ
Group
BW
BioGeomancer: automated georeferencing to map the world's biodiversity data
PLOS Biology
 , 
2006
, vol. 
4
  
(art. e381)
Häuser
C
Steiner
A
Holstine
J
Scoble
M
Digital Imaging of Biological Type Specimens
A Manual of Best Practice
 , 
2005
Stuttgart
Heidorn
PB
Zhang
Q
Label annotation through biodiversity enhanced learning
iConference 2013 Proceedings
 , 
2013
(pg. 
882
-
884
iConference
Hill
A
, et al.  . 
The notes from nature tool for unlocking biodiversity records from museum records through citizen science
ZooKeys
 , 
2012
, vol. 
209
 (pg. 
219
-
233
)
Hirschman
L
, et al.  . 
Habitat-Lite: A GSC case study based on free text terms for environmental metadata
OMICS: A Journal of Integrative Biology
 , 
2008
, vol. 
12
 (pg. 
129
-
136
)
Jenkins
M.
Prospects for biodiversity
Science
 , 
2003
, vol. 
302
 (pg. 
1175
-
1177
)
Jinbo
U
Kato
T
Ito
M
Current progress in DNA barcoding and future implications for entomology
Entomological Science
 , 
2011
, vol. 
14
 (pg. 
107
-
124
)
Jordan
RC
Gray
SA
Howe
DV
Brooks
WR
Ehrenfeld
JG
Knowledge gain and behavioral change in citizen-science programs
Conservation Biology
 , 
2011
, vol. 
25
 (pg. 
1148
-
1154
)
Kelty
C
Panofsky
A
Currie
M
Crooks
R
Erickson
S
Garcia
P
Wartenbe
M
Wood
S
Seven dimensions of contemporary participation disentangled
Journal of the Association for Information Science and Technology
 , 
2014
 
Forthcoming. doi:10.1002/asi.23202
Kremen
C
Ullman
KS
Thorp
RW
Evaluating the quality of citizen-scientist data on pollinator communities
Conservation Biology
 , 
2011
, vol. 
25
 (pg. 
607
-
617
)
Kumar
N
Belhumeur
PN
Biswas
A
Jacobs
DW
Kress
WJ
Lopez
IC
Soares
JV
Leafsnap: A computer vision system for automatic plant species identification
 , 
2012
Springer
(pg. 
502
-
516
Computer Vision–ECCV 2012
Lintott
CJ
Schawinski
K
Slosar
A
Land
K
Bamford
S
Thomas
D
Raddick
MJ
Nichol
RC
Szalay
A
Andreescu
D
Galaxy Zoo: Morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey
Monthly Notices of the Royal Astronomical Society
 , 
2008
, vol. 
389
 (pg. 
1179
-
1189
)
Lister
AM
Climate Change Research Group
Natural history collections as sources of long-term datasets
Trends in Ecology and Evolution
 , 
2011
, vol. 
26
 (pg. 
153
-
154
)
Loreau
M
, et al.  . 
Diversity without representation
Nature
 , 
2006
, vol. 
442
 (pg. 
245
-
246
)
Masters
KL.
A Zoo of Galaxies
2013
 
arXiv preprint arXiv:1303.7118
McNeill
J
, et al.  . 
International code of nomenclature for algae, fungi and plants (Melbourne code) adopted by the Eighteenth International Botanical Congress Melbourne, Australia, July 2011
2012
Koeltz Scientific Books
Moritz
C
Patton
JL
Conroy
CJ
Parra
JL
White
GC
Beissinger
SR
Impact of a century of climate change on small-mammal communities in Yosemite National Park, USA
Science
 , 
2008
, vol. 
322
 (pg. 
261
-
264
)
Mungall
CJ
Gkoutos
GV
Smith
CL
Haendel
MA
Lewis
SE
Ashburner
M
Integrating phenotype ontologies across multiple species
Genome Biology
 , 
2010
, vol. 
11
  
(art. R2)
National Research Council
Learning Science in Informal Environments: People, Places, and Pursuits
 , 
2009
National Academies Press
National Research Council
A Framework for K–12 Science Education: Practices, Crosscutting Concepts, and Core Ideas
 , 
2012
National Academies Press
Nelson
G
Paul
D
Riccardi
G
Mast
A
Five task clusters that enable efficient and effective digitization of biological collections
ZooKeys
 , 
2012
, vol. 
209
 (pg. 
19
-
45
)
Nerbonne
JF
Vondracek
B
Volunteer macroinvertebrate monitoring: Assessing training needs through examining error and bias in untrained volunteers
Journal of the North American Benthological Society
 , 
2003
, vol. 
22
 (pg. 
152
-
163
)
Newman
G
Graham
J
Crall
A
Laituri
M
The art and science of multi-scale citizen science support
Ecological Informatics
 , 
2011
, vol. 
6
 (pg. 
217
-
227
)
Parmesan
C
Yohe
G
A globally coherent fingerprint of climate change impacts across natural systems
Nature
 , 
2003
, vol. 
421
 (pg. 
37
-
42
)
Parr
CS
Guralnick
R
Cellinese
N
Page
RDM
Evolutionary informatics: Unifying knowledge about the diversity of life
Trends in Ecology and Evolution
 , 
2012
, vol. 
27
 (pg. 
94
-
103
)
Penrose
D
Call
SM
Volunteer monitoring of benthic macroinvertebrates: Regulatory biologists’ perspectives
Journal of the North American Benthological Society
 , 
1995
, vol. 
14
 (pg. 
203
-
209
)
Rainbow
PS.
Marine biological collections in the 21st century
Zoologica Scripta
 , 
2009
, vol. 
38
 (pg. 
33
-
40
)
Rotman
D
Hammock
J
Preece
J
Hansen
D
Boston
C
Bowser
A
He
Y
Motivations affecting initial and long-term participation in citizen science projects in three countries
iConference 2014 Proceedings
 , 
2014
(pg. 
110
-
124
iConference
Russell
KN
Do
MT
Huff
JC
Platnick
NI
MacLeod
N
Introducing SPIDA-Web: Wavelets, neural networks and internet accessibility in an image-based automated identification system
Automated Taxon Identification in Systematics: Theory, Approaches and Applications
 , 
2007
CRC Press Taylor & Francis Group
(pg. 
131
-
152
)
Sheshadri
A
Lease
M
SQUARE: A benchmark for research on computing crowd consensus
Proceedings of the First AAAI Conference on Human Computation and Crowdsourcing
 , 
2013
Association for the Advancement of Artificial Intelligence
(pg. 
156
-
164
Shirk
JL
, et al.  . 
Public participation in scientific research: A framework for deliberate design
Ecology and Society
 , 
2012
, vol. 
17
  
(art. 29)
Tschöpe
O
Macklin
JA
Morris
RA
Suhrbier
L
Berendsohn
WG
Annotating biodiversity data via the Internet
Taxon
 , 
2013
, vol. 
62
 (pg. 
1248
-
1258
)
Wake
DB
Vredenburg
VT
Are we in the midst of the sixth mass extinction? A view from the world of amphibians
Proceedings of the National Academy of Sciences
 , 
2008
, vol. 
105
 (pg. 
11466
-
11473
)
Walther
G-RR
Post
E
Convey
P
Menzel
A
Parmesan
C
Beebee
TJC
Fromentin
J-M
Hoegh-Guldberg
O
Bairlein
F
Ecological responses to recent climate change
Nature
 , 
2002
, vol. 
416
 (pg. 
389
-
395
)
Whitmer
A
, et al.  . 
The engaged university: Providing a platform for research that transforms society
Frontiers in Ecology and the Environment
 , 
2010
, vol. 
8
 (pg. 
314
-
321
)
Wieczorek
J
Guo
Q
Hijmans
R
The point-radius method for georeferencing locality descriptions and calculating associated uncertainty
International journal of geographical information science
 , 
2004
, vol. 
18
 (pg. 
745
-
767
)
Wieczorek
J
Bloom
D
Guralnick
R
Blum
S
Döring
M
Giovanni
R
Robertson
T
Vieglais
D
Darwin Core: An Evolving Community-Developed Biodiversity Data Standard
PLOS ONE
 , 
2012
, vol. 
7
  
(art. e29715)
Wiggins
A
Crowston
K
From conservation to crowdsourcing: A typology of citizen science
HICSS Proceedings of the 2011 44th Hawaii International Conference on System Sciences
 , 
2011
IEEE Computer Society
(pg. 
1
-
10
)
Yoder
MJ
Mikó
I
Seltmann
KC
Bertone
MA
Deans
AR
A gross anatomy ontology for Hymenoptera
PLOS ONE
 , 
2010
, vol. 
5
  
(art. e15991)
Zelditch
ML
Swiderski
DL
Sheets
HD
Geometric morphometrics for biologists: A primer
 , 
2012
Academic Press