Data Policy
When you publish in GENETICS, you help to catalyze scientific advances by sharing your experimental reagents, results and interpretations. For these articles to have the greatest impact, authors need to make unique research materials and data freely available to other investigators (see GENETICS, 184: 1).
Data
All data that are not represented fully within the tables and figures and necessary for confirming the conclusions presented in your manuscript must be made publicly available. Data should be archived in a public repository or database managed by a third party. Please note that journal policy does not allow for data to be available upon request.
For studies that include genotype and DNA sequences, the genotypes or sequences for all individuals should be provided in common formats, such as alignment, VCF, or plain-text files, and raw sequence reads should be deposited in a public repository. If publicly available data are used, the authors should provide details of which sequences or positions were included in final analyses so that the reader can replicate the results easily. Additionally, any new software reported in a manuscript must be made publicly available.
In rare cases, exceptions to this policy may be granted in response to a written description of the compelling reasons for withholding data; authors should include these reasons in their cover letter upon initial submission of the manuscript. All exceptions must be approved by the Editor in Chief.
Failure to comply with the data policy may result in your manuscript being returned without review.
Authors must include a Data Availability Statement to point editors, reviewers, and readers to their data (see below for instructions).
Which data should I archive online?
Where should I archive data?
Can I archive the data on my personal or lab website?
What information should accompany the data?
What format should the data be in?
Do I have to provide software and code used for analysis?
Do I have to provide the simulated data I used for testing a method?
What about privacy concerns for human data?
What data should I provide for a quantitative trait study?
What if the data I'm using are proprietary?
How do I use published data in an ethical manner?
What if all of my data is represented in the manuscript itself?
Who should I contact if I have questions about the data policy?
Which data should I archive online?
You must archive online any and all data that cannot be represented fully within the tables and figures of the manuscript but that is necessary for confirming the conclusions presented (e.g. genome-scale data, phenotype screen results, nucleotide sequences, genetic mapping data, relational databases, etc.), as well as data that were used in the final analyses. For studies that include genotype and DNA sequences, the genotypes or sequences for all individuals should be provided in a common format (in addition to depositing raw sequence-reads in a public repository). These formats can be alignments, VCF, or plain-text files. Other researchers need such data in order to interpret, replicate or build upon your work.
These datasets should be made available online at the time of publication in one of the ways described below.
Where should I archive data?
The preferred method is to archive the data with a public repository or database that is managed by a third-party, with assurances of long-term stability, redundancy, and accessibility. Examples include repositories such as figshare, Dryad and Zenodo. Specialized data should be deposited at appropriate archives; for example, gene sequences must be deposited with GenBank, EMBL-Bank, DDBJ, or SRA and genome-wide gene expression data at GEO or Array Express.
Specialized data repositories are also acceptable for certain types of datasets. Examples include but are not limited to CIMMYT's Dataverse, NCSU's Drosophila Genetics Reference Panel, and Jackson Laboratory databases. If you have questions about the suitability of an insitutionally-affiliated data repository, please email [email protected].
Can I archive the data only on my personal or lab website?
Not unless there are compelling reasons for such archiving. It is difficult to guarantee that personal or lab websites will be maintained as long-term data archives.
What information should accompany the data?
You should provide enough information to allow other researchers to understand and use the data and to replicate your analysis (in conjunction with the article). We encourage you to provide additional information and annotations that would be helpful for someone who is unfamiliar with the details of your experiment.
What format should the data be in?
The file format depends on the data type, but in general you should use the format most likely to be useful for computational analysis by a third party. For example, you should not use PDF (or other static) format for text-based data files (e.g., spreadsheets).
Do I have to provide software and code used for analysis?
If you have developed software, applications, or have used custom code that would be required by someone trying to replicate your analysis, you will need to make these available in a public repository managed by a third party, just as for other data types. You will also have to license your software, and the terms of the license need to be described in the data availability statement at the end of your manuscript. We strongly encourage you to use an open source license.
Do I have to provide the simulated data I used for testing a method?
Not necessarily, but you will need to describe the method used to simulate the data in enough detail to allow another researcher to replicate the simulation. For example, you could provide an overview of the method in the article text and upload as supplemental information the scripts that you used to generate the simulated data.
What about privacy concerns for human data?
Data from human research participants should be available in a manner compliant with the relevant IRB rules and restrictions and with assurance that the requisite informed consent was obtained. IRB numbers must be listed in the manuscript. Authors should submit data from GWAS and other studies of human participants that connect genotypes and phenotypes to dbGaP. If the data are not accepted by dbGaP, proof of this event must be provided.
If you request a data policy exception (see Data Policy Exception Request Procedure below), you must still provide sufficient supporting data (metadata) so the study can be used in future meta-analyses. For de-identified data: document the methods used to de-identify the data so it is clear to downstream users. Please use the standardized dbSNP format for all questionnaires and include detailed descriptions of phenotyping, as well as all analyzed SNP names. We encourage provision of other supporting data. We may request additional data during the review process. For example, reviewers may ask for a specific analysis and output to determine whether population structure has been properly accounted for.
What data should I provide for a quantitative trait study?
For quantitative trait studies, including QTL mapping and GWAS, we ask that you provide all the data used in the analysis. This includes all genotypes, phenotypes, marker information (e.g., linkage map or list of SNP locations), and any population structure information. In addition, you should provide enough details of the analysis methods and software used such that another researcher could repeat the analysis steps using the raw data.
Please be sure to include a detailed description of the locations of these data in the Data Availability Statement at the end of your manuscript. We recommend using the Data Availability Statement from Stanley et al., 2017 as a reference.
What if the data I'm using are proprietary?
Proprietary data should be made freely available; it can be in a de-identified format if necessary. For de-identified data: document the methods used to de-identify the data so it is clear to downstream users. In rare cases, requests to make raw data available upon request will be considered. In such a case, authors must follow the Data Policy Exception Request Procedure outlined below.
Regardless of any exception granted for full genotype and/or phenotype data, authors must provide detailed summary statistics. For genomic selection papers, we need genotypes but not necessarily SNP identities. For GWAS studies, this includes p-values, estimated effects, allele frequencies, and other related measures for all SNVs. If you're doing GWAS, we'll need to know the SNP identities to replicate the analysis, but you can use any sample naming scheme you prefer to protect your data. A minimum level of genetic information for evaluation of the manuscript will be required and must include the genomic locations and specific alleles for all variants cited in the manuscript.
How do I use published data in an ethical manner?
While authors have a responsibility to make all the data supporting the conclusions described in their paper publicly available, people who use that data have a responsibility to use it ethically. Authors must:
- Cite the source of the data in manuscripts that result from its analysis, including a clear statement of the data source;
We urge users of published data to:
- Make the people who generated the data aware of its use;
- Offer data generators the opportunity to review manuscripts describing analysis of their data before submission, when appropriate;
- Offer authorship to data generators when appropriate and justified.
What if all of my data is represented in the manuscript itself?
Write a data availability statement that affirms this.
For example: The authors affirm that all data necessary for confirming the conclusions of this article are represented fully within the article and its tables and figures.
Data Policy Exception Request Procedure:
Authors must include the full rationale for withholding data in both the Cover Letter and the Data Availability Statement. Authors should prepare a Materials Transfer Agreement (MTA) that will guide the sharing of data after a request is granted. This should be included as a supplemental file. This document must address each of the following:
- Procedure for requesting access to the full data;
- Parties responsible for evaluating requests (an author cannot be the sole named individual responsible for ensuring data access);
- Criteria and procedure used to evaluate such requests;
- Description of any conditions for gaining access to the data;
- Expected response time to requests for data access;
- Description of what usage or types of requests would lead to a denial of access.
A previously approved exception and the relevant documentation can be found in the supplemental material of Cheng, 2021.
Who should I contact if I have questions about the data policy?
Data Availability
All articles published by GENETICS must include a Data Availability Statement at the end of the manuscript. Please thoroughly read the above Data Policy before writing the statement. Make sure to list the accession numbers or DOIs of any data you have placed in public repositories. List the file names and descriptions of any data you will upload as supplemental information. The statement should also include any applicable IRB numbers. You may include specifications for how to properly acknowledge or cite the data.
Please note: For the review process, you must agree to make data confidentially available to editors and reviewers. If accession numbers/DOIs are not available yet, please indicate in your Data Availability Statement where the data will be submitted. Accession numbers/DOIs must be available prior to publication but do not need to be available at initial submission.
Example 1: Strains and plasmids are available upon request. File S1 contains detailed descriptions of all supplemental files. File S2 contains SNP ID numbers and locations. File S3 contains genotypes for each individual. Sequence data are available at GenBank and the accession numbers are listed in File S3. Gene expression data are available at GEO with the accession number: GDS1234. Code used to generate the simulated data can be found at https://zenodo.org/record/123456.
Example 2: Strains and plasmids are available upon request. The authors affirm that all data necessary for confirming the conclusions of the article are present within the article, figures, and tables.
If your manuscript contains complex trait data, please prepare a Data Availability Statement that clearly describes the locations of your genotype data, phenotype data, marker information (e.g., linkage map or list of SNP locations), and population structure information. We recommend using the Data Availability Statement from Stanley et al., 2017 as a reference.