Background: Chinese giant salamander (CGS) is the largest extant amphibian species in the world. Owing to its evolutionary position and four peculiar phenomenon of life (longevity, starvation tolerance, regenerative ability, and hatch without sunshine), it is an invaluable model species for research. However, lack of genomic resources leads to fewer study progresses in these field, due to its huge genome of ∼50 GB extremely difficult to be assembled.

Results: We reported the sequenced transcriptome of more than twenty tissues from adulthood of CGS using Illumina Hiseq 2000 technology, and a total of 93,366 no redundancy transcripts with a mean length of 1,326 bp were obtained. We for the first time developed an efficient pipeline to construct a high quality reference gene set of CGS and obtained 26,135 coding genes. BUSCO and homologous assessment showed that our assembly captured 70.6% of vertebrate universal single-copy orthologs, and this coding gene set had a higher proportion of completeness CDS with comparable quality of the protein sets of Tibetan frog.

Conclusions: These highest quality data will provide valuable reference gene set to the subsequent research of CGS. In addition, our strategy of de novo transcriptome assembly and protein identification is applicable to similar studies.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data