Whole-genome resequencing reveals signatures of selection and timing of duck domestication

Abstract Background The genetic basis of animal domestication remains poorly understood, and systems with substantial phenotypic differences between wild and domestic populations are useful for elucidating the genetic basis of adaptation to new environments as well as the genetic basis of rapid phenotypic change. Here, we sequenced the whole genome of 78 individual ducks, from two wild and seven domesticated populations, with an average sequencing depth of 6.42X per individual. Results Our population and demographic analyses indicate a complex history of domestication, with early selection for separate meat and egg lineages. Genomic comparison of wild to domesticated populations suggests that genes that affect brain and neuronal development have undergone strong positive selection during domestication. Our FST analysis also indicates that the duck white plumage is the result of selection at the melanogenesis-associated transcription factor locus. Conclusions Our results advance the understanding of animal domestication and selection for complex phenotypic traits.

Your revised manuscript "Whole-genome resequencing reveals signatures of selection and timing of duck domestication" (GIGA-D-17-00301R1) has been assessed again by our reviewers.
I am happy that the reviewers feel that many of their previous comments have been addressed and the manuscript has improved. However, some issues remain to be clarified, and I urge you to fully address the latest comments in a second revised manuscript.
Please see the reviewers' reports below.
Comment: Please pay particular attention to the comments of reviewer 1 regarding availability of population genetics raw data, coordinates of sweeps, scripts, etc, as well as full step-by-step description of all wet and dry lab protocols. As I explained in my previous decision letter, reproducibility of methods and full data availability are of utmost importance for acceptance in GigaScience.
Reply: Many thanks for your comment. All population genetic raw data and command scripts have been submitted to the GigaDB database according to reviewer 1 and your suggestion. We also add the description of all wet and dry protocols to our current manuscript, please see specific replies below.
As mentioned previously, the protocols.io platform is a very convenient way to share experimental protocols, and I recommend you to consider this option. Please do let me know if you have questions regarding how we can integrate protocols.io entries with your manuscript.
Our data curators will contact you to prepare the GigaDB set that will be posted alongside your manuscript, if it is accepted.
Please include a citation to your GigaDB dataset (including the DOI link) to your reference list, and cite this in the data availability section and elsewhere in the manuscript, where appropriate.
Please follow this example format for the reference: [xx] Author1 N, Author2 N, AuthorX N. Supporting data for "Title of your manuscript". GigaScience Database 2018. http://dx.doi.orgxxxxxxxxxxxx (We will replace the dummy doi (xxxx) with the final version prior to acceptance).
Once you have made the necessary corrections, please submit a revised manuscript online at: https://giga.editorialmanager.com/ If you have forgotten your username or password please use the "Send Login Details" link to get your login information. For security reasons, your password will be reset.
Please include a point-by-point within the 'Response to Reviewers' box in the submission system. Please also ensure that your revised manuscript conforms to the journal style, which can be found in the Instructions for Authors on the journal homepage.
The due date for submitting the revised version of your article is 15 May 2018.
We look forward to receiving your revised manuscript soon.
Best wishes, Hans Zauner GigaScience www.gigasciencejournal.com Reviewer reports: Reviewer #1: In my opinion, this revision adequately answers most of my comments. The manuscript has also improved with the answers to the other reviewer.
I have only a few remaining comments. The most serious one is about data availability and protocols.
Comment: The revision comes with better data availability. VCF files of variants are included, plus a couple of perl scripts used to process them. However, full population genetic statistics and sweep locations still seem to be missing. Scripts for running the bioinformatic tools are not included. The description of the PCR follow-up of variants has been expanded. However, the description does not include the full protocol, and neither does the description of any of the other laboratory methods. This level of detail is about the standard in the field, but it does not seem to live up to the policies of the journal.
Reply: Many thanks for your positive comments and apologies for any inadequate descriptions. All population genetic raw data and command scripts have been submitted to the GigaDB database. We used a sliding windows method for FST calculation in our sweep analysis, as this approach is more robust and informative for genome-wide evaluation. This approach means that one window might have several genes, and some very long genes may be present in multiple overlapping windows. Thus, we substituted sweep locations for gene locations, and added this information to our current manuscript, please see supplemental tables S5 and S8. Comment: A couple of times (the justification for the mix of sequence coverages, and the detail about the origin of the ducks), the reply to reviewers contain useful information that was not incorporated in the manuscript. In my opinion, the Methods should include this information, and in particular as much detail as possible about the origin of the animals.
Reply: Many thanks for your suggestion. We have add the justification of coverage to the Methods section of our current manuscript, please see lines 486-490. We have also detailed the point of origin for our samples, please see lines 468-474.

Minor comments
Comment: The reply to reviewers describe the variant filtering as "extremely strict". In fact, it seems to be mostly the default starting criteria suggested by GATK developers in their "best practices" (with a "QUAL" cutoff and a higher "QD" cutoff). How were these filter settings chosen? Are they actually "extremely strict"?
Reply: Many thanks for your questions. Of course, all variants were filtered with "hard filter" criteria suggested by GATK developers. However, to identify variants associated with white plumage traits, the "extremely strict" criteria were used, where variant allele frequency must be 0 in all white duck individuals and be 1 in all non-white duck individuals. Or, 1 in all white duck individuals and 0 in all non-white duck individuals. In other words, the variant had to be completely associated with the phenotype to pass our strictest threshold.
Comment: Line 247: What does "completely associated with selection" mean in this context? Reply: Thanks for your question. "The duck white plumage is completely associated with selection at the MITF locus" means the mutations were completely associated with white plumage phenotype.
Comment: Lines 252-253: In what sense did the PCR primer design fail? Were you unable to amplify the region, amplify specifically, or unable to find primers that lived up to your quality criteria? I fully understand that PCR primer design fails occasionally, but I think a more specific description would be useful.
Reply: We were unable to design suitable primers to amplify this region, and we add this explanation to our current manuscript, please see line 270.
Reviewer #2: The revised version of the manuscript entitled, "Whole-genome resequencing reveals signatures of selection and timing of duck domestication" tackles the genomic question of domestication. The authors have done much to improve the manuscript. While most of my comments are now minor, there are a few additional requests that would be nice to see incorporated in order to strengthen the manuscript. I believe that the paper will be ready for submission if the authors incorporate all/most comments (See below).
Comment: INTRODUCTION/DATA DESCRIPTION: I think the introduction is much improved. In addition to minor comments below, I would still like to see the authors develop at least one hypothesis as to what genes/genetic regions may be playing a role in the meat/egg domestication process of these ducks. Alternatively (or in addition to), I would like to see a hypothesis regarding what they think some of the differences may be between wild and domesticated populations.
Reply: Thank you very much for your positive comments. Respectfully, the advantage of comparative genomic studies such as ours is that they are agnostic screens of the entire genome without a priori need to develop specific hypotheses. Previous similar studies of domestication (including Rubin et al. Nature 2010; Vonholdt et al. Nature 2010; Montague et al. PNAS 2014, among many others) have used these approaches to identify regions of the genome affected by artificial selection without a priori hypotheses. We adapted these approaches to the study of ducks here, with the broad aim of identifying whether ducks were domesticated once (null hypothesis) or separately for egg and meat breeds (alternative hypothesis). Moreover, we assess the role of domestication on genes related to plumage and neuroanatomy. We respectfully suggest that to develop further post hoc hypotheses to fit our results at this point would be disingenuous, and defeat the purpose of these sorts of agnostic screens. Rubin, C. J., et al. (2010). "Whole-genome resequencing reveals loci under selection during chicken domestication." Nature 464(7288): 587-591. Vonholdt, B. M., et al. (2010). "Genome-wide SNP and haplotype analyses reveal a rich history