Phylotranscriptomic Insights into the Diversification of Endothermic Thunnus Tunas

Abstract Birds, mammals, and certain fishes, including tunas, opahs and lamnid sharks, are endothermic, conserving internally generated, metabolic heat to maintain body or tissue temperatures above that of the environment. Bluefin tunas are commercially important fishes worldwide, and some populations are threatened. They are renowned for their endothermy, maintaining elevated temperatures of the oxidative locomotor muscle, viscera, brain and eyes, and occupying cold, productive high-latitude waters. Less cold-tolerant tunas, such as yellowfin tuna, by contrast, remain in warm-temperate to tropical waters year-round, reproducing more rapidly than most temperate bluefin tuna populations, providing resiliency in the face of large-scale industrial fisheries. Despite the importance of these traits to not only fisheries but also habitat utilization and responses to climate change, little is known of the genetic processes underlying the diversification of tunas. In collecting and analyzing sequence data across 29,556 genes, we found that parallel selection on standing genetic variation is associated with the evolution of endothermy in bluefin tunas. This includes two shared substitutions in genes encoding glycerol-3 phosphate dehydrogenase, an enzyme that contributes to thermogenesis in bumblebees and mammals, as well as four genes involved in the Krebs cycle, oxidative phosphorylation, β-oxidation, and superoxide removal. Using phylogenetic techniques, we further illustrate that the eight Thunnus species are genetically distinct, but found evidence of mitochondrial genome introgression across two species. Phylogeny-based metrics highlight conservation needs for some of these species.


Introduction
The Thunnus tuna clade consists of some of the most commercially important fish species in the world. The genus includes the three iconic bluefin species, which have all in recent times undergone precipitous population declines, some now recovering with careful management plans, but all remain the target of fisheries owing to their high commercial value (Matsuda et al. 1998;Safina and Klinger 2008;MacKenzie et al. 2009;ISC 2016). By contrast, other Thunnus species sustain huge global fishery yields; with the yellowfin tuna, Thunnus albacares, in particular comprising the seventh highest global landings of all fish species in 2014 (FAO 2016). Although the eight Thunnus species are thought to have diverged rapidly (Miya et al. 2013;Santini et al. 2013;D ıaz-Arce et al. 2016), considerable ecological and physiological variability exists within the clade (Bernal et al. 2017).
All tuna species (also including the genera Euthynnus, Auxis, Katsuwonus, and Allothunnus) are regionally endothermic (supplementary fig. S1, Supplementary Material online). Unlike other teleosts, much of their aerobic red muscle is located near the center of the body. The evolution of vascular countercurrent heat exchangers has enabled the conservation of heat generated by metabolism and contraction in this muscle, allowing the maintenance of elevated tissue temperatures (Carey and Teal 1966;Block and Finnerty 1994;Sepulveda et al. 2008). The Thunnus tunas are particularly notable among the tunas as they have diversified rapidly and show a wide degree of variability in their distributions and thermal tolerances. Taxonomists initially split tunas of the genus Thunnus into two subgenera based on morphological characters: the tropical Neothunnus (including yellowfin, blackfin [T. atlanticus], and longtail [T. tonggol]) and the more  Gibbs and Collette 1967). These cold-tolerant Thunnus species, unlike the three species of the Neothunnus subgenus, have additional heat-exchangers around their viscera, enabling retention of heat generated during digestion and in some cases the brain and eye regions (Linthicum and Carey 1972). Increases of visceral temperature postdigestion that may increase speed of digestion and result in a larger thermal excess in cooler high latitude (Whitlock et al. 2015). The albacore tuna also extends its range into high-latitude waters, preferring waters where the sea surface temperatures are as low as 14 C (Arrizabalaga et al. 2015). The bigeye tuna occupies tropical and subtropical waters, but spends considerable time diving to deep mesopelagic resources, encountering cool waters, before returning to the surface to warm up (Holland and Sibert 1994;Schaefer and Fuller 2010). Although it was placed in the Thunnus subgenus, it shares characteristics with the tropical Neothunnus and was considered an intermediate between the two (Gibbs and Collette 1967). Electronic tagging has shown that the three bluefin tuna species are especially cold-tolerant, feeding as large adults in high latitudes in subpolar waters where sea temperatures can be as low as 9 C at the surface and 0-2 C at depth (Bestley et al. 2009;Block et al. 2011;Arrizabalaga et al. 2015;Wilson et al. 2015).
The capacity of bluefin tunas to spend prolonged periods of time in cool subpolar and temperate waters has been hypothesized to be associated with increases in cardiac capacity and aerobic metabolism. Among Thunnus species with measurements to date, bluefin tunas have been shown to have elevated cardiac capacities particularly in excitation-contraction coupling (Landeira-Fernandez et al. 2004) and an increased capacity to maintain cardiac ion channels conductance at low temperatures (Galli et al. 2011). The deep red muscle of yellowfin tuna is known to be specialized to operate at high temperatures, being very sensitive to temperature decrease (Altringham and Block 1997). This suggests that even at low ambient water temperatures, elevated temperatures must be maintained in deep muscle to maintain near-optimal function. Similar physiological studies have not been carried out in any bluefin tuna species, and so it is not known whether there deep red muscle is equally thermally sensitive. However, large Atlantic bluefin tuna are known to maintain stable elevated temperatures in deep muscle at ambient water temperatures of 9-17 C (Stevens et al. 2000). This has led authors to suggest that the capacity of bluefin tunas to occupy high latitudes with low sea surface temperature is associated with increased endothermic capacity , as higher thermal gradients between deep muscle and the ambient water must be maintained for prolonged periods. At low temperatures below their thermal optimum, the metabolic rate of Pacific bluefin tuna increases. This is atypical of ectothermic fish, where metabolic rate would decrease with temperature, but typical of endothermic animals ). This suggests that upregulation of aerobic metabolism may be associated with thermoregulation and endothermy in these species.
Another aspect in which the Thunnus vary is in their reproductive biology. The three species of the subgenus Neothunnus, alongside the bigeye, remain in warmtemperate to tropical waters year-round. These four species are thought to spawn throughout much of the year. By contrast, the albacore and bluefin tunas have more restricted spawning seasons, as they spend much of the year feeding in productive higher-latitude waters, returning to subtropical or tropical seas to spawn for short durations (Schaefer 2001;Muhling et al. 2017). Higher fecundities and generally faster generation times of tropical tunas (Juan-Jord a et al. 2013) have counterbalanced enormous fishing pressure (Juan-Jord a et al. 2015), although bigeye and yellowfin populations are, in some regions, decreasing in size (table 2). Life-history traits are therefore critical to the survival of tunas in modern oceans where humans have increased predation pressure (Kroodsma et al. 2018). Despite the relevance to high-stakes fisheries, little is known of the evolutionary processes to have driven this physiological and ecological diversification of the Thunnus clade.
Estimates of phylogenetic relationships using partial genomic data (D ıaz-Arce et al. 2016) and mitochondrial sequence data (Chow and Kishino 1995;Bayona-V asquez et al. 2017) have suggested that the three bluefin tuna species are paraphyletic. In the partial genomic data phylogeny, the southern bluefin is sister to the warm-water tuna clade (supplementary fig. S2, Supplementary Material online). By contrast, in the mitochondrial genome phylogenies, Atlantic and southern bluefins are sister species, but Pacific bluefin is sister taxa to the albacore (supplementary fig. S2, Supplementary Material online). Prior to the advent of mitochondrial phylogenetics, Pacific and Atlantic bluefin tunas were considered a single species (Chow and Kishino 1995;Collette et al. 2001). These northern bluefins are thought to be only weakly differentiated in the nuclear genome (Chow et al. 2006;D ıaz-Arce et al. 2016). This mitochondrial-nuclear discordance has been used to hypothesize introgression between albacore and Pacific bluefin tuna. However, this may also be driven by incomplete lineage sorting (ILS; Toews and Brelsford 2012), which has not been tested. Rapid radiations are generally associated with a high degree of gene-tree discordance, where different genes have conflicting topologies (Pamilo and Nei 1988). This may be a result of both ancestral hybridization events and failure of ancestral genetic variation to sort in-between speciation events, resulting in ILS (Maddison 1997). Given the rapid divergence and large population sizes of Thunnus tuna, ILS is likely to have generated significant gene-tree discordance. This may have misled phylogenies of the Thunnus tuna to date, as "supermatrix" techniques they utilized may be inaccurate when genealogical discordance is high (Kubatko et al. 2007;Pirie 2015;D ıaz-Arce et al. 2016;Mendes and Hahn 2017). Genealogical discordance may also explain why the evolution of traits in a rapid radiation may not correlate with monophyletic relationships in the species tree. This may be explained by processes such as parallel selection on standing variation or introgression (Hahn and Nakhleh 2016;Pease et al. 2016).
Diversification of Endothermic Tunas . doi:10.1093/molbev/msy198 MBE Here, we used an RNA-seq data set consisting of multiple individuals of each Thunnus species to explore the evolutionary processes underlying their diversification. The first aim of our study was to clarify phylogenetic relationship among the Thunnus species. The second aim was to assess how hybridization, selection on standing variation, and de novo mutation have shaped the Thunnus radiation. We find that de novo mutation has played a role in the evolution of the tropical group and that selection on standing variation has driven the phenotypic divergence of cold tolerance in bluefin tunas. This includes bluefin-specific variants in genes associated with key metabolic and thermogenic functions.

Results and Discussion
To elucidate the evolutionary history of Thunnus, and to learn more about the evolution of endothermy, specifically in the bluefin tuna and visceral endotherm groups, we collected RNA-sequence data for 25 individual tunas, supplementing NCBI Short Read Archive data to reach a total of 46 individuals (supplementary table S1, Supplementary Material online). This transcriptomic data set included at least two individuals from each of the eight Thunnus species, plus the skipjack tuna, Katsuwonus pelamis, as an outgroup. We note that there was some variation in tissue type used among the 46 individuals, which may have reduced coverage of tissuespecific genes. We first generated a merged de novo assembly based on 102 unique assemblies from skeletal muscle (red and white muscle) and heart tissue (compact ventricle, spongy ventricle, and atrium) of three individual Pacific bluefin tuna. Multiple tissue types were used to provide a more complete reference assembly. The merged assembly comprised 48,648 transcripts, corresponding to 29,556 genes. This merged assembly was more complete and had less redundancy than any of the individual assemblies (complete sequences for 89.1% of a bony fish single-copy ortholog set [see Materials and Methods]; supplementary table S2 and fig. S3, Supplementary Material online). Therefore, sequence data from each of the 46 individuals were mapped and genotyped against this merged reference transcriptome.

Introgression Evident in Mitochondrial, but Not Nuclear Genomes of Tunas
Using either coalescence or concatenated-gene (supermatrix) phylogenetic analyses, we inferred the same phylogeny. This was consistent using coalescent trees based on either gene trees with poorly supported branches collapsed into hard polytomies or not, and with supermatrix trees based on either 4-fold degenerate sites or full transcripts, with all nodes in the trees supported by a posterior probability of 1 or bootstrap support of 100% (supplementary fig. S4, Supplementary Material online). Importantly, this phylogenetic analysis demonstrates that both the bluefin tunas and visceral endotherms are paraphyletic, as was suggested by partial-genomic data (D ıaz-Arce et al. 2016). All individuals within each species formed monophyletic groups (supplementary fig. S5, Supplementary Material online), with Atlantic and Pacific bluefin tunas being segregated as distinct taxa. The Atlantic and Pacific bluefin tuna distinction is further supported by a Bayes Factor Delimitation model (posterior probability ¼ 0.999). Dating the Thunnus phylogeny using fossil calibration shows that this lineage has radiated rapidly within the last 6-10 My ( fig. 1). Notably, a high level of gene-tree versus speciestree discordance is observed, as indicated by quartet concordance factors <50% at internal nodes ( fig. 1). We calculated that this discordance did not deviate from expectations under ILS (P ¼ 0.2), which argues against ancestral hybridization events evident in the nuclear genome ( We did, however, find significant mitochondrial-nuclear discordance (supplementary fig. S8, Supplementary Material online). In the mitochondrial phylogenetic tree, as in other mitochondrial studies (Chow and Kishino 1995;Qiu et al. 2014;Bayona-V asquez et al. 2017), the Pacific bluefin and the albacore are clustered. In the nuclear-based tree the Pacific bluefin is sister to the Atlantic bluefin, more distantly related to the albacore ( fig. 1). This discordance has been used as evidence of introgression between Pacific bluefin and the albacore (Chow and Kishino 1995;Bayona-V asquez et al. 2017). By simulating gene trees, we found that the sister relationship of Pacific bluefin and albacore tunas is unlikely due to ILS alone (P ¼ 0.02). This shows that this mitochondrialnuclear discordance is indeed likely due to introgression.
Taken together, we find that Thunnus tuna show evidence of introgression in the mitochondrial, but not nuclear genomes. This has also been observed in a wide range of taxa (Zieli nski et al. 2013;Pons et al. 2014;Good et al. 2015), and is likely when selection has favored these mitochondrial variants and background selection and recombination has removed the introgressed nuclear variants (Bonnet et al. 2017;Sloan et al. 2017). However, this does little to explain the evolution of endothermy in bluefin tunas. Instead, much of the nuclear gene-tree discordance is likely due to standing variation from the ancestral populations being retained during a rapid radiation, as not all the variation has had time to be fixed between species splits (Pamilo and Nei 1988). We note that power to detect hybridization events is reduced as time since hybridization increases, particularly when speciation events are rapid (Folk et al. 2018). We therefore cannot conclusively rule out hybridization playing a greater role in the Thunnus radiation.

Parallel Selection on Standing Genetic Variation in Bluefin Tuna
To test whether parallel selection on this ancestral genetic variation underlies the evolution of endothermy in bluefin Ciezarek et al. . doi:10.1093/molbev/msy198 MBE tunas or visceral endotherms, we used a phylogenetic genome-wide association test (PhyloGWAS; Pease et al. 2016). We found that there was an excess of genes with nonsynonymous mutations shared by the three bluefin tuna species compared with expectations due to ILS alone (P < 0.0006). By contrast, there was no excess of shared nonsynonymous mutations between the visceral endotherms (P ¼ 0.14). We found parallel selection on standing genetic variation in 96 genes in the bluefin tunas (supplementary table S3, Supplementary Material online). Functional gene ontology (GO) enrichment analysis indicated enrichment in several terms relating to glycerol-3-phosphate dehydrogenase (GPDH) activity (supplementary table S4, Supplementary Material online). Furthermore, bluefin-specific nonsynonymous mutations were found in genes that are functionally related to aerobic metabolism Wiens et al. 2017) and relevant to the evolution of endothermy (table 1; Matsuda et al. 1993;Mattiazzi et al. 2002;Lushchak et al. 2014;Naiki et al. 2014). These genes are all characterized by one or two bluefin group-specific nonsynonymous substitutions ( fig. 1). This is consistent with previous studies that have found that the vast majority of genes with variants fixed by selection on standing variation are characterized by one or just a few mutations (Pease et al. 2016;Wu et al. 2017). Single mutations can, however, have strong effects on phenotype (Mattiazzi et al. 2002;Fox et al. 2017).
To understand the evolution of endothermy it is critical to elucidate how incremental increases in metabolic rate, oxidative pathways, and thermogenesis occur in any lineage. Studies to date have suggested metabolic rates may be higher in bluefin tuna species; Pacific bluefin tuna have higher metabolic rates than yellowfin tuna, acclimated at the same temperatures and caught in the same California current waters. These fish were trained to swim in a flume at the same swimming speed and were of similar size ). Measurements of southern bluefin tuna metabolic rates have also exceeded those from skipjack, albacore, kawakawa tuna (Euthynnus affinis), and yellowfin (Fitzgibbon et al. 2008). However, accurately measuring metabolic rates of large, active pelagic fish is difficult, and comparing between studies is difficult due to experimental variation which may substantially impact results.
Here we found bluefin tuna-specific mutations in six genes associated with aerobic capacity (table 1), supporting the view of unique adaptations in aerobic metabolism in the FIG. 1. Fossil-dated phylogeny of tunas and parallel selection in bluefin species. 3D surface protein structures for genes with shared nonsynonymous mutations in bluefin tunas and a function relating to aerobic metabolism are given in the blue box, with the two branches where parallel selection on these variants occurred highlighted with blue squares, with bluefin species highlighted in blue. 3D protein structures inferred for genes under lineage-specific selection in the warm-water group are given in the red box. The branch these changes correspond to is indicated with a red square, with warm-water species highlighted in red. Species with visceral endothermy are indicated with a "V." Amino acid changes and positions on the zebrafish reference genome (see table 1) are given, and their location on each protein model is highlighted in red. Species illustrations are from the FAO and wikimedia, rescaled according to the maximum length of each species, taken from Juan-Jord a et al. (2013). Gray error bars show 95% confidence intervals of divergence-date estimates. Black brackets on root node indicate minimum and maximum fossil calibration. Node labels are Bayesian posterior probability (pp), followed by concordance factors (cf) for the primary quartet inferred by ASTRAL; values lower that 100% indicate increasing gene-tree discordance, which in this case are within expectation from ILS.
Diversification of Endothermic Tunas . doi:10.1093/molbev/msy198 MBE bluefin tunas. This includes isoforms of GPD1 (GPD1b and GPD1c), which works in concert with mitochondrial GPD2 to form the glycerol-3-phosphate (G3P) shuttle. This pathway uses the NADH synthesized during glycolysis to contribute electrons to the oxidative phosphorylation pathway in the mitochondria, fueling ATP synthesis. ATP synthesis by G3Pmediated respiration is inefficient, as only two ATP molecules are synthesized per NADH molecule, instead of the three ATP derived from NADH formed inside the mitochondria. However, it sustains a high rate of oxidative phosphorylation (Gong et al. 1998). This metabolic inefficiency, coupled with the high oxidative phosphorylation rates, produces heat elevating tissue temperatures locally. This pathway has been found to be important for thermogenesis in mammals and bumble bee flight muscles (Gong et al. 1998;Masson et al. 2017). GPDH also plays an important role in lipid metabolism (Hao et al. 2015), which these mutations also may relate to. Selection for aerobic metabolism in bluefin tunas is further implicated by mutations in key oxidative phosphorylation genes (ATP5C1; Matsuda et al. 1993), Krebs cycle (ACO2; Lushchak et al. 2014) and b-oxidation (HADHB; Naiki et al. 2014) genes, as well as in SOD1. This latter gene codes for the enzyme that removes toxic reactive oxygen species (ROS) produced during aerobic respiration (Mattiazzi et al. 2002). Recent measurements on isolated mitochondria of Pacific bluefin tuna indicate that they produce ROS at a similar rate to ectothermic fish species at a similar temperature (Wiens et al. 2017). However, this rate is temperature dependent, meaning that the elevated body temperature in bluefin tuna tissues will increase metabolism and ROS production and increase the risk of mitochondrial damage (Murphy 2009). Notably, the amino acid substitutions in SOD1 in bluefin tunas ( fig. 1) are both adjacent to a well-characterized mutation site in mice. G93A transgenic mice show significant defects in mitochondrial function due to increased oxidative damage (Mattiazzi et al. 2002). The proximity of the bluefin substitutions to this site suggests that they possibly relate to reducing oxidative damage, which would be exacerbated by elevated metabolic rates associated with selection for endothermy.
Using 3D-structure models predicted using Phyre2 (Kelley et al. 2015) for each of these proteins, we identified that all nonsynonymous mutations fell at amino acid sites on the surface of the protein, except for that in ATP5C1 ( fig. 1). None of these mutations is particularly conserved across other organisms (ConSurf score 1-6). However, we identified that these amino acid changes significantly alter either protein electrostatic potential or stability (table 1), which likely indicate functional roles associated with these mutations. Overall, our analyses indicate that parallel selection on genetic variants relating to both aerobic metabolism pathways and oxygen utilization has contributed to the unique phenotype of bluefin tunas. Experimental validation of these candidate genes, alongside detailed analysis of the G3P-shuttle in the context of isolated mitochondrial function and oxidative phosphorylation in tuna is now necessary. These experiments could determine whether selection for G3P-mediated respiration provides novel pathways for heat production in bluefin tunas, as in bees and mammals.

Positive Selection in Warm-Water Tunas
Our transcriptomic data indicate that warm-water and tropical tunas (bigeye, yellowfin, longtail, and blackfin) form a clade ( fig. 1)  Ciezarek et al. Destroys toxic free radicals, the majority of which are produced by mitochondria (Mattiazzi et al. 2002) NOTE.-Amino acid changes are provided in figure 1.  MBE bluefin genes, the amino acid changes were at the surface ( fig. 1), in variable amino acid sites (ConSurf score 1-6), but with significant impacts on the overall protein electrostatic potential. The substitution in GRNA at zebrafish amino acidsite 461 decreased electrostatic potential (nonparametric Wilcoxon signed-rank test Z-score ¼ À5.8), whereas that at 468 increased it (Z-score ¼ 5.9). The substitution at CRNKL1 site 683 decreased electrostatic potential (Z-score ¼ À4), whereas that at site 748 did not (Z-score ¼ 0.3). None of these substitutions strongly influenced protein stability (pseudo DDG À0.09 to 0.32). Importantly, many more genes specifically associated with reproduction and maturation may not have been expressed in our muscle samples.

Bluefin Tuna Species Are Evolutionarily Distinct and Globally Endangered
Finally, combining red list status from the World Conservation Union (IUCN) and branch lengths in our phylogenetic trees, we calculated EDGE scores for each species (i.e., evolutionary distinctness and globally endangered status [Isaac et al. 2007], table 2). This score is a popular metric in conservation biology, as it identifies those threatened species that deserve particular attention because of their unique evolutionary history. Relative to other tunas, we found the highest scores in southern bluefin (5.1) and Atlantic bluefin (4.3). We also highlight the importance of gathering data for the longtail tuna, which is currently classified as "data-deficient," but has a substantial global fishery yield of 201,894 tonnes in 2015 (table 2). Longtail tuna may be increasingly targeted directly and as bycatch, as the populations of other species with large fisheries are decreasing or are quota limited. Tunas are unusual among bony fish for their evolution of endothermy (Block et al. 1993). Our analyses shed light on the phylogeny and genetic basis of the evolution of endothermy in tunas. We found bluefin-specific nonsynonymous mutations in six key aerobic metabolism genes, supporting the hypothesis that adaptations in aerobic metabolism are key to the endothermy of these species, and their ability to expand their niche into high latitudes. Further studies may wish to consider whether there has been any genetic convergence with the slender tuna, Allothunnus fallai, whose phylogenetic placement is currently unclear (Sepulveda et al. 2008;Santini et al. 2013). Because we have shown a high degree of genetree discordance among the Thunnus species, indicating a high degree of ancestral variation, trait evolution should not be considered to show a stepwise progression across the Thunnus phylogeny. This will influence how we analyze the diversification of further important traits as increasing information becomes available. For example, hypoxia tolerance varies considerably among the five Thunnus species for which it has been measured, inconsistently with the current species tree (Bernal et al. 2017). This trait will have strong impacts on future tuna distributions as deoxygenation of the ocean will increase under global warming scenarios (Breitburg et al. 2018). The rapid availability of fully sequenced genomic resources particularly from bluefin tunas will improve our capacity to study these unique fish.

Sample Collection and RNA Sequencing
Samples were collected from multiple individuals of all eight Thunnus species along with the Skipjack tuna, Katsuwonus pelamis, which was used as an outgroup. Short-read sequence data downloaded from the NCBI Short Read Archive for 19 individuals were supplemented with samples collected from fish markets (the United Kingdom and the Netherlands) and from the wild (Bahamas, southern Australia, and California; supplementary table S1, Supplementary Material online) by the Tuna Research and Conservation Center, University of Tasmania, and Cape Eleuthera Institute. Tissue samples stored in RNALater (Thermo Fisher Scientific, Waltham, MA) were sent to BGI Tech Solutions, Hong Kong. There, RNA was extracted using TRIzol (Invitrogen, Carlsbad, CA). Using the TruSeq RNA Library Preparation Kit (v2), cDNA libraries were produced. These were then sequenced using the Illumina HiSeq 4000 (Illumina Inc, San Diego, CA) with 100-bp paired-end reads. Raw sequencing reads have been deposited to NCBI GenBank under BioProject PRJNA495053.

Read Processing and Reference Transcriptome
Initial quality control was carried out by BGI Tech Solutions, with low-quality reads (average phred < 20), primer and adapter sequences trimmed. Upon retrieval, sequencing errors in these reads were corrected using Rcorrector (V1.0.2; Song and Florea 2015), then further trimmed for low-quality bases and adaptor sequences (phred < 2, following Macmanes [2014]), using Trim Galore! (v0.4.0; http:// www.bioinformatics.babraham.ac.uk/projects/trim_galore/; last accessed March 31, 2017). Reads for each of the three Pacific bluefin tuna individuals with multiple tissue types sequenced were normalized in silico to a depth of 100Â using Trinity (v2.4.0; Grabherr et al. 2011). Separate assemblies were carried out for each of the three individuals, using multiple assembly softwares and k-mer length settings. For Trinity (v2.4.0), Bridger (v2014-12-01; Chang et al. 2015), and Binpacker ( six each for SOAPdenovo-trans, Velvet-OASES, Trans-ABySS, and Shannon; one for IDBA-tran, which builds on each previous k-mer length assembly resulting in one final assembly), for 102 unique assemblies in total. Only transcripts of at least 300 bp were retained. Coding sequences (CDS) were inferred from these using TransDecoder (v3.0.1; Grabherr et al. 2011). CDS from all 102 assemblies were concatenated and clustered using CD-HIT-EST (Fu et al. 2012), with the settings -aL 0.005 -aS 1 -c 0.97 -d 0 -G 0 -M 0 (Cerveau and Jackson 2016). Contigs generated by multiple assembly softwares and k-mer settings are less Ciezarek et al. . doi:10.1093/molbev/msy198 MBE likely to be artifacts (Cerveau and Jackson 2016;Durai and Schulz 2016). We therefore retained the longest CDS representative of clusters containing clusters corresponding to at least an average of two assembly softwares with two kmer settings each per individual. The transcript corresponding to each of these CDS was then extracted to give the final merged assembly. Completeness of the merged assembly and individual assemblies were assessed using BUSCO (Benchmarking Universal Single-Copy Orthologs; Simão et al. 2015) and the Actinopterygii_odb9 database (supplementary table S2, Supplementary Material online). A transcript-to-gene map was constructed using CORSET (Davidson and Oshlack 2014) and the mappings of the three Pacific bluefin tuna used to construct the final reference transcriptome.

Read Alignment
Reads from all individuals were separately mapped against this reference transcriptome using STAR (v2.5.3a; Dobin et al. 2013) and the double-pass method, allowing for any number of hits, and scoring all hits with an equal best mapping score as primary. Reads were realigned around indels using GATK (v3.7;McKenna et al. 2010). Genotypes were then inferred using samtools and bcftools (v1.5; Li et al. 2009;Li 2011). Bases with a base quality <20 were filtered using samtools mpileup. Using bcftools call and filter, heterozygous sites with either allele represented by <2 reads were trimmed, as were sites with high-quality read depth <3, genotype or variant quality <20 and single nucleotide polymorphisms (SNPs) within 3 bp of an indel. Resultant vcf files were converted to multisample fasta files using vcf2fas (v17072015; https://github.com/brunonevado/vcf2fas. Last accessed June 16, 2017. Indels were coded as missing data. IUPAC (International Union of Pure and Applied Chemistry) ambiguity codes were used for heterozygous sites.

Phylogenetic Reconstructions
Phylogenetic trees were reconstructed using both supermatrix and summary multispecies coalescent (MSC) tools. For the MSC tree, transcript trees were inferred for each transcript. These were first filtered to remove columns with <10% occupancy and sequences with >50% gaps (Sayyari et al. 2017). Transcripts that then had sequences from <4 species were discarded. Trees were then inferred for each using RAxML (v8.2.10; Stamatakis 2014), with 200 rapid bootstraps and the GTRGAMMA model of evolution. SH-like (Shimodaira-Hasegawa-like) node support values were subsequently calculated (Anisimova et al. 2011). Transcript trees were discarded if the three skipjack tuna individuals (where sequence data were present) were not monophyletic, to remove trees with unrealistic deep coalescences. Transcript trees with at least one node with SH-like support >10 were retained, with only the transcript with most nodes with SHlike support >10 per CORSET cluster retained, in order to ensure the independence of markers. The species tree was then inferred using ASTRAL (v5.5.6; Mirarab et al. 2014). Two runs were carried out with forced species monophyly: one with poorly supported nodes (SH-like <10) collapsed into hard polytomies, as recommended by Zhang et al. (2018), and another where they were not. Gene-tree concordance values of each primary split, calculated using ASTRAL, in the first of these trees are reported. These indicate the percentage of gene trees supporting the species-tree relationship for each branch within a tree. A further run was carried out with the hard polytomy data set without forced species monophyly in order to ensure each species formed monophyletic groups.
Supermatrix-based phylogenetic trees were also inferred. Fixed nucleotides for each species from all the transcripts used in the ASTRAL phylogeny were concatenated. A maximum-likelihood phylogeny was inferred using RAxML (v8.2.1) with 200 rapid bootstraps and the GTRGAMMA model of evolution. A Bayesian phylogenetic tree was also inferred using ExaBayes (v1.5; Aberer et al. 2014). Four runs of three coupled chains were carried out for 1 million generations (25% as burn-in). In order to ensure Monte Carlo Markov chains were run long enough for accurate estimation of posterior means, we ensured the effective sample size were >200 for all parameters, alongside a potential scale reduction factor of 1 in order to assess convergence between chains. To account for the potential effect of selection, a further Bayesian phylogenetic tree was also inferred for concatenated 4-fold degenerate sites from the transcript set.
To infer a mitochondrial phylogenetic tree, reads from all Thunnus individuals were mapped against a reference Pacific bluefin tuna mitochondrial genome (NCBI accession number: NC_008455), with Katsuwonus individuals mapped against a reference skipjack tuna mitochondrial genome (NC_005316). Reads were mapped using STAR as above except allowing a maximum of two hits per read. They were subsequently genotyped and converted to fasta using samtools and bcftools. This was as above, except using the bcftools call setting "-ploidy 1," not using homozygous blocks. CDS for each of the 13 genes of the mitochondrial genome for each individual were extracted using the MITOS webserver (Bernt et al. 2013). These were aligned using MAFFT (v7.271;Katoh and Standley 2013) and concatenated. A phylogenetic tree was then inferred, using ExaBayes, as with the nuclear supermatrix tree. The species identity of all individuals was verified by NCBI BlastN search of these mitochondrial CDS against the NCBI nr database, in addition to species monophyly in mitochondrial (supplementary fig. S8,

Test for Species Delimitation
To test for species delimitation between the Atlantic and Pacific bluefin tuna, we implemented Bayes Factor Delimitation (BFD*; Leache et al. 2014), implemented in SNAPP (v1.3.0;Bryant et al. 2012), a package from BEAST (v2.4.7;Bouckaert et al. 2014). To do this, we generated a data set containing only Atlantic and Pacific bluefin tuna individuals, alongside bigeye tuna individuals as outgroups. We filtered this to include only one biallelic SNP (within the Pacific and Atlantic bluefin) with minor allele count >1 and with all individuals present per CORSET cluster. Delimitation runs were run for 48 steps at a chain length of 200,000 each, following 50,000 as pre-burn-in, with a gamma lambda prior of (2, 200). Two models were compared: 1) a model where individuals of Atlantic bluefin and Pacific bluefin corresponded to only one species, and 2) a model where individuals of Atlantic and Pacific bluefin correspond to two separate species (the current delimitation). Bigeye tuna was included as an outgroup in both analyses, designated as a separate species. Convergence was assessed by two separate runs of each model converging to within 1 log-likelihood unit. A Bayes Factor >10 was used to determine significance (Kass and Raftery 1995), and a posterior probability calculated by Bayes Factor/(Bayes Factor þ 1).

Timetree Inference
Using the concatenated transcript data set used in the supermatrix analysis and a fossil calibration, we estimated divergence dates for the Thunnus tuna. A hard-minimum fossil calibration of 37.8 My was used on the root of the tree based on the earliest known stem-group Thunnus fossil, T. abchasicus, which has been documented from the mid-Eocene in Russia (Monsch and Bannikov 2011). A soft-maximum age calibration of 56 My was used, corresponding to the start of the Eocene period. MCMCTree (v4.9e; Yang and Rannala 2006) was used for dating analysis. Following a burn-in of 100,000 iterations, Markov chains were sampled every 1,000th iteration until 40,000 trees were sampled, using the approximate likelihood algorithm. Priors for sigma2 and rgene were set to G(1, 10) and G(2, 11335) based on substitution rates inferred using BaseML. As the maximum-likelihood phylogenetic tree inferred from the same data set was relatively clock-like (root-to-tip variance 0.0000003) we used a global molecular clock to date the tree, following Walker et al. (2017). Two runs were carried out, with convergence of mean posterior times assessed, and infinite-site plots used to assess linearity of data (supplementary fig. S9, Supplementary Material online).
This dated tree was used to calculate EDGE scores (Isaac et al. 2007), based on IUCN red list threat-status (GE, as of February 2018; IUCN 2017) and evolutionary distinctness (ED) scores calculated in the R package "caper" (Orme 2013). EDGE scores for each species were calculated as follows:

Genetic Structure
To assess genetic structure among the eight Thunnus species, we used an MDS and hierarchical clustering, using ADMIXTURE (v1.3;Alexander et al. 2009). Each Thunnus individual was regenotyped not allowing for homozygous blocks. Resultant vcf files were merged and filtered to include with at least one SNP, no indels, <5% missing taxa, and minor allele count >1 using vcftools (v0.1.3.2; Danecek et al. 2011). Using PLINK (v1.9b;Purcell et al. 2007), SNPs with an R 2 value >0.1 of any other SNP within a 50-bp sliding window were removed to ensure unlinked SNPs were analyzed. ADMIXTURE runs were carried out with k values from 1 to 10, with the optimal run assessed using the lowest ADMIXTURE CV error. MDS analysis was carried out in PLINK.

Tests for Introgression
To test whether a phylogenetic tree or a phylogenetic network, including hybridization events, best explains the data, we used a maximum pseudolikelihood approach (Sol ıs-Lemus and An e 2016), implemented within the Julia package PhyloNetworks (v0.6.0; Sol ıs-Lemus et al. 2017). Uncollapsed transcript trees with >10 nodes with SH-like support >80 were used (only the transcript tree with the most such nodes per CORSET cluster was retained). Tip-based quartet concordance factors were calculated for each set of four individuals using the readTrees2CF function. Inter-and intraspecific concordance factors were then calculated from these using the mapAllelesCFtable function. Using SNaQ (Species Networks apply Quartets), a phylogenetic tree with no hybridization events was inferred. In order to assess whether the MSC adequately explains gene-tree discordance to this species tree, we used the TICR test (Tree Incongruence Checking in R; Stenz et al. 2015), using the "phylolm" R package. A chisquared test was used to compare observed concordance factors with expected concordance factors calculated from the species tree under the MSC. Lack of significance (P > 0.05) would indicate that the coalescent tree inferred without introgression events adequately fits the data.
To test whether the mitochondrial genome clustering of albacore and Pacific bluefin tuna is caused by introgression, we simulated gene trees under coalescence using "ms" (Hudson 2002), according to the coalescent units inferred by SNaQ. These coalescent units were unaltered, as they were first multiplied by 4, to account for the effective population size of the mitochondrial genome being one-fourth of the nuclear genome (Latta 2006), but then divided by 4, as coalescent units in SNaQ are defined as generations/effective population size, whereas in ms they are generations/4 Â effective population size. In total, 100 replicates of 100,000 gene trees were simulated. The average frequency per replicate where Pacific bluefin and albacore clustered was used as a P value, with P < 0.05 suggesting their clustering to be unlikely due to ILS alone (Buckley et al. 2006).

Detecting Selection
We inferred selection on genetic variants in three groups: 1) the bluefin species; and 2) the warm-water species Ciezarek et al. . doi:10.1093/molbev/msy198 MBE (yellowfin, bigeye, longtail, and blackfin); and 3) the visceral endotherm species (albacore, the three bluefin species, and bigeye). As they were monophyletic in the species tree, we tested for selection on de novo mutation in the warm-water tuna using the CodeML branch-site test (Zhang et al. 2005), within PAML v4.9e (Yang 2007). As they were not monophyletic, we tested for parallel selection on ancestral variation in the bluefin tunas and visceral endotherm tunas using a phylogenetic genome-wide association study (PhyloGWAS; Pease et al. 2016), implemented in MVFtools (v0.5.1.3; Pease and Rosenzweig 2015). Fixed sites for the longest CDS per CORSET cluster were used for all analyses.
For the CodeML branch-site test, genes whose gene trees significantly differed from the species tree (SH-test P < 0.05) and in which the warm-water species were not monophyletic were discarded. Gene trees were used for those that significantly differed, but still had the warm-water species monophyletic. The species tree was used for the remainder of transcripts. For each CDS, four CodeML runs were carried out: a null model, where no site allows for x > 1 in the target branch was compared with three runs of an alternate model, with an added site class allowing for x > 1, each with different starting values of x (0.5, 1, 1.5). Likelihood-ratio tests between each of these runs and the null were carried out, with significance inferred if P < 0.05 in all three runs (v 2 1 ). Gaps in the sequence data were allowed, but significant genes (P < 0.05) where the associated nonsynonymous variants were present in only one of the warm-water species were discarded.
PhyloGWAS assesses whether there is an excess of nonsynonymous variants that are shared by individuals that are not monophyletic but share a phenotypic trait. Codon sites from all genes were filtered to remove sites with >2 nonsynonymous variants among the Thunnus, as these may reflect multiple changes rather than parallel selection on ancestral variation. Codon sites with >2 missing taxa were also filtered. The number of genes containing nonsynonymous mutations in the bluefin tunas and visceral endotherm tunas were calculated using MVFtools. Sites were only counted in the bluefin analysis if they had sequence data for each of the three bluefin species alongside the albacore tuna. Sites were only counted in the visceral endothermy analysis if they had sequence data for each of the three species lacking visceral endothermy (the longtail, blackfin, and yellowfin) alongside the bigeye tuna. To assess significance, this number was compared with the expected number shared due to ILS alone. To do this, 100,000,000 genes with a single change were simulated using ms (Hudson 2002) over the consensus phylogeny, using coalescent units inferred by SNaQ (these were divided by 4 as SNaQ coalescent units are generations/effective population size, whereas ms coalescent units are generations/4Â effective population size). Two chromosomes (as tuna are diploid) were simulated for one individual of each species. The P values were the proportion of the simulated data sets that had at least the same number of shared substitutions as the observed number, out of a sample size the same as the number of variable amino acid sites tested. The number of variable amino acid sites was counted across all genes tested, including variable amino acids among species, which were previously filtered out when identifying fixed amino acids for each species. To allow for the codons with missing taxa in our data set, the number of sites in the simulated data set that fit the pattern except for one or two of the possible missing taxa was also counted, but weighted by the number of tested codons that had missing taxa.
GO term enrichment for genes under selection in each group was assessed using the topGO R package and the Fisher's exact test with the "weight01" algorithm, and P < 0.01 for significance (weight01 P values are deemed adjusted; Alexa et al. 2006). GO terms with only one representative in the significant set were discarded. Zebrafish orthologs for genes with functions relating to aerobic metabolism, which is hypothesized to relate to endothermy in bluefin tunas ), were downloaded from UniProt (UniProt Consortium 2015). This was aligned with translated CDS from the tuna, using MAFFT (v7.271), and used to annotate which site the bluefin substitution was at. The same procedure was used for the warm-water tuna selection genes.
To examine possible functional effect of nonsynonymous variants, we predicted 3D protein structure for each of the identified candidate genes using the Phyre2 (Protein Homology/analogY Recognition Engine v2.0) webserver (http://www.sbg.bio.ic.ac.uk/phyre2; last accessed May 11, 2018; Kelley et al. 2015), using the "intensive" modeling mode. Prior to this, amino acids prior to the zebrafish start codon in the MAFFT alignment were trimmed. The evolutionary conservation of each nonsynonymous mutation was identified using the ConSurf webserver (http://consurf.tau.ac.il/ 2016; last accessed May 11, 2018). Slowly evolving regions are likely to have functional effects. Each amino acid site is therefore scored from 1 to 9, where 1 is highly variable and 9 is highly conserved (Ashkenazy et al. 2016). Effects of each mutation on protein stability were assessed using the SDM2 (Site Directed Mutator v2) webserver (http://structure.bioc.cam.ac.uk/sdm2; last accessed May 11, 2018; Pandurangan et al. 2017). Changes in electrostatic potential of each protein, which is responsible for catalytic activity in many enzymes, were measured using the MutantElec webserver (http://structuralbio.utalca.cl/ mutantelec; last accessed May 11, 2018. Valdebenito-Maturana et al. 2017. MutantElec assesses whether amino acid changes significantly increase or decrease electrostatic potential using a nonparametric Wilcoxon signed-rank test with a confidence interval of 0.05.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.
Reis, and one anonymous reviewer for comments on the manuscript, Kenneth Monsch for discussion of fossil calibration and NERC, the UK BBSRC, Stanford University and the Monterey Bay Aquarium for support of the Tuna Research and Conservation Center, the Ocean Foundation and the TAG A Giant Fund of the Ocean Foundation, and Leverhulme Trust for funding. The authors declare no competing interests.