-
PDF
- Split View
-
Views
-
Cite
Cite
Renée F Brown, The Importance of Data Citation, BioScience, Volume 71, Issue 3, March 2021, Page 211, https://doi.org/10.1093/biosci/biab012
Close - Share Icon Share
Over the past decade, a growing movement toward greater accessibility, transparency, and reproducibility has taken hold in the scientific community. One area of open science that has gained significant traction is the concept of FAIR data principles—that is, guidelines ensuring data are findable, accessible, interoperable, and reusable. Indeed, many scientific journals now require data be made available in FAIR-aligned repositories prior to publication—and, in some cases, even prior to acceptance. These issues are important to me as an information manager in the US Long Term Ecological Research (LTER) Network, which has long been a pioneer in scientific data management. FAIR data are accompanied by high-quality metadata, and when archived in FAIR-aligned repositories, they receive persistent, unique identifiers (e.g., a digital object identifier, or DOI) to track provenance and enable data citation. While great strides have been made to establish data management best practices, scientists are lagging behind with respect to proper data citation. Data are a valuable research output, and providers should be credited accordingly. Moreover, scientists have a professional responsibility to ensure that data are properly cited.
Citing data is as easy as citing a publication; thus, the format and placement of data citations should be quite familiar. Data citations should include, at minimum, the author, year of publication, title, publisher (the data repository), DOI, and, where appropriate, the version or date of access. Citations should appear wherever data are referenced in manuscripts and grant proposals and should also be embedded in the references cited section, alongside literature references. When data are directly connected with publications and vice versa, they become easier to find, thereby increasing research transparency and reproducibility while also encouraging data reuse for new research and synthesis. Moreover, data citation metrics, which are most reliably tracked through proper citation, can provide important insights to data providers and funding agencies. Knowing how, when, where, and why data are being used, as well as who is using them, can inspire new research directions, collaborations, and funding opportunities. Data citations should also be included in curriculum vitae, which, along with publication records, can be an important metric for assessing candidates for jobs or promotion.
Finally, in keeping with open science principles, data providers should include a nonrestrictive licensing scheme that determines how data may be used by others, typically referred to in the metadata as an intellectual rights statement or a data use agreement. In the LTER Network, the most open and commonly applied scheme is the Creative Commons Attribution 4.0 International License (CC BY 4.0), which allows consumers to freely reuse, redistribute, transform, or build on the data, in whole or in part, provided that data sources are properly cited. Alternatively, providers may release their data to the public domain via the Creative Commons Public Domain Dedication (CC0); however, this waives the desired attribution requirement. Regardless, it is fundamentally important and a matter of professional ethics that scientists not only cite data but also communicate with data providers to prevent data misuse and encourage collaboration in the spirit of open science.