Novel transcriptome resources for three scleractinian coral species from the Indo-Pacific

Abstract Transcriptomic resources for coral species can provide insight into coral evolutionary history and stress-response physiology. Goniopora columna, Galaxea astreata, and Galaxea acrhelia are scleractinian corals of the Indo-Pacific, representing a diversity of morphologies and life-history traits. G. columna and G. astreata are common and cosmopolitan, while G. acrhelia is largely restricted to the coral triangle and Great Barrier Reef. Reference transcriptomes for these species were assembled from replicate colony fragments exposed to elevated (31°C) and ambient (27°C) temperatures. Trinity was used to create de novo assemblies for each species from 92–102 million raw Illumina Hiseq 2 × 150 bp reads. Host-specific assemblies contained 65 460–72 405 contigs, representing 26 693–37 894 isogroups (∼genes) with an average N50 of 2254. Gene name and/or gene ontology annotations were possible for 58% of isogroups on average. Transcriptomes contained 93.1–94.3% of EuKaryotic Orthologous Groups comprising the core eukaryotic gene set, and 89.98–91.92% of the single-copy metazoan core gene set orthologs were complete, indicating fairly comprehensive assemblies. This work expands the complement of transcriptomic resources available for scleractinian coral species, including the first reference for a representative of Goniopora spp. as well as species with novel morphology.


Data Description
Background A growing body of genomic information for reef-building corals has resolved phylogenetic relationships and helped reveal how this unique taxonomic group calcifies and responds to thermal stress [1][2][3][4]. Such information is critical for understanding the adaptive capacity of these ecologically important organisms, particularly in an era of global climate change [5]. Transcriptomic and/or genomic resources are cur-rently available for 23 scleractinian species representing 14 genera and 11 families [1,4,[6][7][8][9][10][11][12][13][14][15][16]. We assembled the transcriptomes of 3 scleractinian coral species: the congeners Galaxea astreata, G. acrhelia, and Goniopora columna. This is the first sequence resource for Goniopora spp. and extends the phenotypic diversity represented by coral transcriptomic resources to include submassive (G. astreata) and columnar (G. columna) morphologies [17], which should facilitate additional insight into the evolutionary history of this taxonomic order. To generate more comprehensive reference transcriptomes, 4-5 replicate cores of a single colony were subjected to a 2-week temperature stress experiment as described in   [18], and paired samples from control (27 • C) and heat (31 • C) treatments were snap-frozen in liquid nitrogen on day 2, day 4, and day 17 (Table 1; note for G. acrhelia, heat-treated fragments were only included for day 4 and day 17). Samples were crushed in liquid nitrogen, and total RNA was extracted using an Aurum Total RNA mini kit (Bio-Rad, Irvine, CA, USA). RNA quality and quantity were assessed using the NanoDrop ND-200 UV-Vis Spectrophotometer (Thermo Scientific, Waltham, MA, USA) and gel electrophoresis.

Samples and sequencing
For transcriptome sequencing, RNA samples from replicate fragments were pooled in equal proportions, and ∼1 μg was shipped on dry ice to the Oklahoma Medical Research Foundation NGS Core, where Illumina TruSeq Stranded libraries were prepared and sequenced on 1 lane of the Illumina Hiseq 3000/4000 to generate 2 × 150 PE reads.

Transcriptome assembly and annotation
Sequencing yielded 92-102 million raw PE reads ( Table 1). The fastx toolkit [19] was used to discard reads <50 bp or having a homopolymer run of "A" ≥9 bases, retain reads with a PHRED quality of at least 20 over 80% of the read, and to trim TruSeq sequencing adaptors. Polymerase chain reaction duplicates were then removed using a custom perl script [20]. Remaining highquality filtered reads (26-35 million paired reads, 4-6 million unpaired reads) ( Table 1) were assembled using Trinity v. 2.0.6 (Trinity, RRID:SCR 013048) [21] using the default parameters and an in silico read normalization step at the Texas Advanced Computing Center at the University of Texas at Austin.
Since corals are "holobionts" comprised of host, Symbiodinium, and other microbial components, resulting assemblies were filtered to identify the host component following the protocol described in Kitchen et al. (2015) [4], with one modification. Briefly, small clusters (= contigs, <400bp) were removed, and a hierarchical series of blast searches against potential contaminants was conducted. First, assemblies were compared to the most complete Cnidarian rRNA database (SILVA: ABAV01023297, ABAV01023333) [22] using BLASTn [23], and good matches (bitscore >45) were removed. Next, transcriptomes were compared to a Cnidarian mitochondrial genome using BLASTn (Acropora tenuis, NCBI: NC 0 03522.1) [24], again discarding contigs with match bit-scores >45. The taxonomic origin of remaining contigs was identified using a series of BLASTx searches against the most complete coral and Symbiodinium gene models (coral: Acropora digitifera, adi v1.01 prot, [14]; Symbiodinium: S. kawagutii, Symbiodinium kawagutii.0819.final.gene.pep, [25]) and NCBI's nonredundant (nr) protein database (downloaded 25 July 2016) [23]. For a contig to remain in the host-specific assembly, it had to both match (E value ≤ 10 −5 ) a gene in the coral proteome more closely than the Symbiodinium proteome and match a metazoan sequence or have no match in the nr database. In addition, contigs with no match to either proteome were also retained if they exhibited a best match to a Cnidarian in the nr database search, a slightly less stringent criterion than that used by Kitchen et al. (2015) [4]. Annotation of host transcriptomes was performed following the protocols and scripts described in [26]. Host contigs were assigned putative gene names and gene ontologies using a BLASTx search (E value ≤ 10 −4 ) against the UniProt Knowledgebase Swiss-Prot database [27]. EuKaryotic Orthologous Groups (KOG) annotations were assigned using a BLAST search against the core eukaryotic gene set from the CEGMA pipeline (CEGMA, RRID:SCR 015055) [28] and the Web-MGA server (WebMGA, RRID:SCR 011951; [29]) [30] and Kyoto Encyclopedia of Genes and Genomes (KEGG) IDs using the KAAS server [31, 32]. The stats.sh command of the BBMap package [33] was used to calculate GC content of host transcriptomes. Transcriptome completeness was evaluated through comparison to the Benchmarking Universal Single-Copy Ortholog v. 2 (BUSCO, RRID:SCR 015008) [34] set for metazoans using the gVolante server [35,36].

Re-use potential
These coral host-specific assemblies are sufficient for use as transcriptome references for Tag-based RNAseq (TagSeq) [37], a cost-effective method that was recently shown to be more accurate at quantifying gene expression levels than traditional RNAseq [38]. The fasta files and associated annotation files have been formatted for direct use in the TagSeq read mapping [39] and GO-MWU analysis pipelines [40].

Funding
Funding for this study was provided by an National Science Foundation International Postdoctoral Research Fellowship, DBI-1 401 165, to C.D.K. and funding from the Australian Institute of Marine Science to C.D.K. and L.K.B.