The genome of the golden apple snail Pomacea canaliculata provides insight into stress tolerance and invasive adaptation

Abstract Background The golden apple snail (Pomacea canaliculata) is a freshwater snail listed among the top 100 worst invasive species worldwide and a noted agricultural and quarantine pest that causes great economic losses. It is characterized by fast growth, strong stress tolerance, a high reproduction rate, and adaptation to a broad range of environments. Results Here, we used long-read sequencing to produce a 440-Mb high-quality, chromosome-level assembly of the P. canaliculata genome. In total, 50 Mb (11.4%) repeat sequences and 21,533 gene models were identified in the genome. The major findings of this study include the recent explosion of DNA/hAT-Charlie transposable elements, the expansion of the P450 gene family, and the constitution of the cellular homeostasis system, which contributes to ecological plasticity in stress adaptation. In addition, the high transcriptional levels of perivitelline genes in the ovary and albumen gland promote the function of nutrient supply and defense ability in eggs. Furthermore, the gut metagenome also contains diverse genes for food digestion and xenobiotic degradation. Conclusions These findings collectively provide novel insights into the molecular mechanisms of the ecological plasticity and high invasiveness.

to get your login information. For security reasons, your password will be reset.
Please include a point-by-point within the 'Response to Reviewers' box in the submission system. Please ensure you describe additional experiments that were carried out and include a detailed rebuttal of any criticisms or requested revisions that you disagreed with. Please also ensure that your revised manuscript conforms to the journal style, which can be found in the Instructions for Authors on the journal homepage.
The due date for submitting the revised version of your article is 21 Jun 2018.
I look forward to receiving your revised manuscript soon.
Best wishes, Hans Zauner GigaScience www.gigasciencejournal.com Reviewer reports: Reviewer #1: In their manuscript Liu et al. reported the genome sequence of the golden apple snail Pomacea canaliculata. They constructed chromosomal-level genome assembly using HiSeq, PacBio, and Hi-C sequencing technologies. They also tested differential gene expression under various environmental stress, showing many genes are responded to maintain homeostasis. In addition, they sequenced gut metagenome of the snail for the first time, implying that microorganisms contribute to digestion and resistance to xenobiotics of the host animal.
I think the massive data provides fundamental information to understand the biology of the animal as well as molluscs, therefore the study is valuable to be published in the journal GigaScience after some corrections.
Overall, the methods are appropriate, but description and interpretation of the results look not sufficient in some points as shown below. P. 5, lines 94-96 "such as Califonia sea hare, Pacific oyster, Pearl oyster,..." should be "such as the Califonia sea hare, the Pacific oyster, the pearl oyster,..." There are many mistakes like this. I won't mention all of them. Please consult professional English editor before submitting the revision. Reply: We have corrected this mistake in the new submitted manuscript. P. 7, lines 148-150 "genes from seven related species..." In fact eight species including Pinctada fucata were analyzed in figures 2a and 4a. Takeuchi et al.(2016, Zoological Letters, 2:3) and Luo et al.(2015, Nature Communications, 6, 8301) should be referred for P. fucata and Lingula anatina genome data, respectively. Reply: We have corrected the species number. Because a new species is added into analysis, now the total species number is nine. The reference paper of the new species "Uliano-Silva M, Dondero F, Dan Otto T, Costa I, Lima NCB, Americo JA, et al. A hybrid-hierarchical genome assembly strategy to sequence the invasive golden mussel Limnoperna fortunei. Gigascience. 2017. doi: 10.1093/gigascience/gix128. " were also added at line 104 in the new submitted manuscript.
In addition, please carefully correct scientific names in Abbreviations and figures. "Lottia gigantean" should be "Lottia giganta" "Aplysia california" should be "Aplysia californica." "Lingula anatine" should be "Lingula anatina" Reply: we have corrected all the mistakes on scientific names in Abbreviations and figures. P. 9 178-179 From the results I could not understand how the idea that the "DNA/hAT-Charlie TEs... promote the potential plasticity in the stress adaptation" came. This hypothesis can be tested using the present RNA-seq data, by checking whether the TEs are up-regulated under the stresses. Reply: Transposons can insert into any genomic regions, which may change the gene regulations, or modify the gene structure thus form new functions. If a genome has high transposon activity, then it has high ability to adapt to the changing environment, so the recent explosion of DNA TEs may benefit the fast evolution of P. canaliculata in the recent history. There were several previous studies (Hua-Van A, Le Rouzic A, Boutin TS, Filée J, Capy P. The struggle for life of the genome's selfish architects. Biol Direct. 2011;6:19; Werren JH. Selfish genetic elements, genetic conflict, and evolutionary innovation. Proc Natl Acad Sci U S A. 2011;108:10863-70) on this issue that provides evidences that TEs can introduce small adaptive changes for a species. Using the RNA-seq data to resolve this question is good idea. In our understanding, TEs can't be transcribed and translated as an independent element, except for some low and random transcriptions which are likely to be no functions. So we analyzed the expression of 709 genes including DNA elements that restricted to the 4% peak inside the gene region, compared with the other genes that outside the 4% peak. Differentially expressed genes (DEG) were defined here by P-value smaller than 0.05 for comparison of treatments (heat, cold, heavy metal and air exposure) and control data. The percent of DEGs in the 4% peak were higher than those of genes outside the peak (10.2% higher for heat, 8.6% higher for cold, 8.6% higher for heavy metal, and 7.3% higher for air exposure). Among the DEGs in the 4% peak, about half are up-regulated and the other half are down-regulated. Moreover, the DEGs in the 4% peak were mainly enriched in cellular metabolic process, response to stimulus, localization and signaling by GO annotation. These results indicated that genes in the 4% peak were likely to be more active in the response of stimulus, promoting the potential plasticity in the stress adaptation. The figure and related context was added in the new manuscript.
P. 17 lines 346-354 "The rich phenotypic... in laboratory." These sentences should be move to Introduction. Reply: We have moved "The rich phenotypic... in the laboratory." into introduction part between line 71 and 80. P. 18 lines 380-381 "total messenger RNAs" Total RNA or messenger RNA? Reply: Here we mean "messenger RNA". The sentence was revised to be "In final, total RNAs were extracted from the stored tissues of P. canaliculata materials, and then mRNAs were pulled out by beads with poly-T for constructing cDNA libraries.  The title "Evolutionary genomic analysis between P. canaliculata and other molluscs" is not appropriate because Lingula is a brachiopod. Reply: The title of Figure 2 is changed to "Evolutionary genomic analysis of P. canaliculata", because our focus is the species of P. canaliculata, other species were used for comparison.

Figure 4
Method for molecular phylogeny construction of CYP genes should be described. Reply: The method was described in Figure 4 legend "The tree was constructed using the maximum likelihood method in MEGA7, and the branch length scale indicates the average number of residue substitutions per site".

Figure S1
Which K-mer size used?
Reply: here we used 17-mer, the K-mer size is 17.  Table S5 was corresponding to the color in the heatmap of figure 4b, Data in Table  S7 was corresponding to the color in the heatmap of figure 5b, so there is no need to add other heat map figures.

Table S9
What "Mean" and "SD" indicate? E-value of blast results? Please describe. Reply: "Mean" and "SD" indicate the mean and standard deviation of relative abundance of a phylum or a genus from the 6 gut microbiota samples. We also added a note under the Table. Reviewer 2: -[the reviewer has no specific comments to the authors at this point, but recommends careful improvements of language and grammar] Reply: We have improved the language and grammar, and polished the text by native speakers.
Reviewer #3: This manuscript presents a high-quality genome assembly for the snail P. canaiculata. Such genome and further analysis presented will contribute deeply for future studies of the molecular evolution and adaptation of molluscs, as well as to the study of the molecular mechanisms leading to -or involved with -invasive species success. I also point out the relevance of a first qualitative description of a high-depth gut microbiome for a snail. For such reasons, I recommend the publication of this manuscript. Nevertheless, I would like to recommend some essential revision prior publication.
First, the English has to be revised. I'll give a few examples bellow, and authors will find major marks in purple concerning specifically the need of English revision in the revised pdf attached. However, the entire manuscript would benefit from a native English speaker revision. Reply: we have revised the descriptions highlighted in the attached pdf, and also asked a native speaker to help polish the language.
Line 95: I would cite here also the draft genome of the invasive Limnoperna fortunei mussel. Reply: The golden mussel "Limnoperna fortunei" and the related article were added in the new manuscript.
Line 95: There is a new version of the Pearl oyster published. If analysis were performed with data cited in line 95, I would advise for updating the analysis with proteins from the new genome (Du X, Fan G, Jiao Y et al. The pearl oyster Pinctada fucata martensii genome and multi-omic analyses provide insights into biomineralization. Gigascience 2017;6(8):1-12). Reply: We have replaced the proteins data of Pinctada fucata to the latest version, and updated all the analysis in the new manuscript.
Line 100-101: Rephrasing is necessary as cellular homeostasis, color and nutrient of the eggs are not species-specific invasive characteristics. Reply: We revised "invasive characters" to "environmental adaptation characteristics".
Line 104-105: same argument as for lines 36-37. Some rephrasing starting from "interrupt transmission…" is necessary. Reply: We agree with the suggestion, and weakened the mood. The sentence is modified to "and provide a basis for interrupting the transmission of pathogenetic nematode parasites". Table S1: Table S1 would benefit of having 2 columns: one with (i) number of reads generated and (ii) total bp produced for each library, instead of having a column 'Data size' (and what G bp means?). Reply: We have made 2 columns in Table S1 according to the suggestions. One column refers to number of sequenced reads, the other column refers to number of sequenced bases.
Line 122: The ratio of genome coverage by reads used as input in the assembly? Rephrase it together with the sentences in lines 126-127, please. Reply: In this sentence "another important aspect for evaluating genome assembly is the ratio of genome coverage." (between line 132 and 133), we want to explain that the ratio of assembly coverage is important. In P. canaliculata, the genome size of 446 Mb was estimated by the distribution of k-mer frequency. In this assembly genome, ~98.6 % sequence has been assembled. In the sentence "we mapped the Illumina shotgun reads to the assembled reference genome. Significantly, 97% and 95% of the genome-derived and transcriptome-derived reads, respectively, could be aligned to the reference genome," (between line 136 and 137), we want to confirm the accuracy and no obvious bias for sequencing and assembly.
Line 123-124 and line 403: Please estimate and present the levels of heterozygosity using the illumina reads. Reply: We used K-mer with K-size 17 to estimate the genome heterozygosity based on algorithm from reference (Liu B, et al. Quantitative Biology 2013:arXiv:1308.2012). The estimated heterozygosity of P. canaliculata range from 1% to 2%. In addition, we also used FIndError (Gnerre S et al., 2011) in the Allpath-LG package to estimate the heterozygosity, the result is 1.75%, consistent with the first method. We have added it in revised manuscript. "With an estimated genome size of 446 Mb and genome heterozygosity between 1% and 2% based on the distribution of k-mer frequency." Line 415-416: "Then, the protein-coding sequences were mapped by RNA-seq data."please explain this sentence. Reply: To determine whether the predicted genes are expressed or not, we used the transcriptome data to map to the CDS of genes. The gene models were retained if they had at least one supporting evidence from UniProt database, InterProScan domain and RNA-seq data.
To be more clear, we have revised this sentence in the new manuscript: "Then, these gene models were annotated by RNA-seq data, UniProt database and InterProScan software".
Line 163: Withdraw "and so on". Reply： "and so on" is removed.
Lines 146-163: To start understanding if the genome composition itself -and not only regulation of gene expression -can play a major role in the success of invasive species, I would advise to compare gene family expansions and contractions between the genomes of two invasive mollusks, which is now possible once the draft genome of L. fortunei is available (GigaScience doi: 10.1093/gigascience/gix128.). Further discussion about the presence -or lack thereof -of common expansions and contractions of gene families would be a great contribution. Such gene families could be further investigated for their roles in the expression of phenotypes related to invasive ecology and behaviour. I would strongly suggest for a comparative analysis of P. canaliculata and L. fortunei protein sets leading to a new Figure S4 and brief discussion on the findings. Reply: We agree with the reviewer's comments. In the revised version, we added the genome data of L. fortune to re-construct the orthoFinder ortholog and paralog gene families. Then, we identified the common expanded gene families both in P. canaliculata and L. fortune. The functions of these gene families are mainly enriched in signal transduction, replication and repair, Translation, glycan biosynthesis and metabolism, Lipid metabolism, endocrine, immune and nervous system. And we have revised the results in Figure S4.