FEVER: an interactive web-based resource for evolutionary transcriptomics across fishes

Abstract Teleost fish represent one of the largest and most diverse clades of vertebrates, which makes them great models in various research areas such as ecology and evolution. Recent sequencing endeavors provided high-quality genomes for species covering the main fish evolutionary lineages, opening up large-scale comparative genomics studies. However, transcriptomic data across fish species and organs are heterogenous and have not been integrated with newly sequenced genomes making gene expression quantification and comparative analyses particularly challenging. Thus, resources integrating genomic and transcriptomic data across fish species and organs are still lacking. Here, we present FEVER, a web-based resource allowing evolutionary transcriptomics across species and tissues. First, based on query genes FEVER reconstructs gene trees providing orthologous and paralogous relationships as well as their evolutionary dynamics across 13 species covering the major fish lineages, and 4 model species as evolutionary outgroups. Second, it provides unbiased gene expression across 11 tissues using up-to-date fish genomes. Finally, genomic and transcriptomic data are combined together allowing the exploration of gene expression evolution following speciation and duplication events. FEVER is freely accessible at https://fever.sk8.inrae.fr/.


Introduction
Gene duplication and subsequent expression changes (e.g.sub / neo-functionalization) are thought to underlie many phenotypic differences across animal species (1)(2)(3)(4)(5)(6).Teleost fish represent a unique opportunity to study this phenomenon as they nearly represent half of all vertebrate species and exhibit an outstanding phenotypic diversity due to their adaptation to a wide range of ecological niches.Notably, their common ancestor underwent a Whole Genome Duplication event (WGD) approximately 300 million years ago (called Ts3R) that provided new raw genetic material for natural selection to act on ( 7 ).Since then, intense genomic reshuffling occurred which probably facilitated the diversification of species and organs ( 8 ).Specific fish lineages, such as salmonids ( 9 ) and carps ( 10 ), underwent supplementary WGDs (called Ss4R and Cs4R, respectively).Additionally, fish species such as zebrafish and medaka represent major models in various research fields such as developmental biology, biomedical research, ecology and evolution.Recent sequencing endeavors provided high-quality genomes for species covering the main fish evolutionary lineages ( 8 ,11 ).However, transcriptomic data across fish species and organs are heterogenous and have not been integrated with newly sequenced genomes making gene expression quantification and comparative analyses particularly challenging.Thus, tools allowing the exploration of gene evolutionary history and their expression profiles across species and organs in fish are still lacking.To fill this gap, we developed the FEVER web service which provides a user-friendly interface to rapidly explore evolutionary dynamics of genes of interest across fish species and tissues at the structural (gene gain and loss) and expression levels.To do so, FEVER reconstructs gene trees using methods inspired by landmark comparative resources (12)(13)(14) that provide orthologous and paralogous relationships as well as gene evolutionary dynamics across 13 species covering the major fish lineages (bowfin, spotted gar, european eel, allis shad, zebrafish, striped catfish, mexican tetra, atlantic cod, medaka, northern pike, eastern mudminnow, brown trout, and rainbow trout), and 4 model species as evolutionary outgroups (human, mouse, fruitfly and roundworm) (Figure 1 ).Second, it supplies unbiased gene expression across 11 tissues using existing RNA-seq datasets ( 15 ) and up-to-date genomes.Indeed, due to the lack of reference genomes, previous RNA-seq data ( 15 ) were mapped on de novo contigs making expression quantification and cross-species investigations strongly biased.Finally, by combining genomic and remapped transcriptomic data, FEVER offers the possibility to rapidly explore gene expression evolution following speciation and duplication events.FEVER is implemented in R and is accessible at https:// fever.sk8.inrae.fr/with all major browsers.This website is free and open to all users and there is no login requirement.

Gene expression quantification
RNA-seq datasets across 11 tissues (brain, gills, heart, muscle, liver, kidney, bones, intestine, embryo, ovary, and testis) and 13 species (rainbow trout, brown trout, eastern mudminnow, northern pike, medaka, atlantic cod, mexican tetra, striped catfish, zebrafish, allis shad, european eel, spotted gar and bowfin) were downloaded from a previous study ( 15 ).Further details on those samples are available in the biosample and bioproject files deposited in SRA W 68 Nucleic Acids Research , 2024, Vol.52, Web Server issue under the PhyloFish umbrella project.Raw reads with known 3 adaptor and low-quality bases (Phred score < 20) were trimmed with TrimGalore ( https:// github.com/FelixKrueger/ TrimGalore , v.0.6.6, parameters: -r_clip 13 -three_prime_clip 2).Next, we mapped the trimmed reads from each library against reference genomes and annotated transcripts using Salmon ( 20 ) (v.1.8,parameters: defaults).Gene-expression levels were measured in transcripts per kilobase million (TPM), a unit which corrects for both feature length and sequencing depth.Tissue-specificity indexes are based on the Tau metric of tissue specificity ( 21 ) ranging from 0 (broad expression) to 1 (restricted expression).Tau is calculated for each gene across brain, gills, heart, muscle, liver, kidney, bones, intestine, embryo, ovary and testis.Tau is not calculated when at least one expression value is missing.A gene is considered tissue-specific when its Tau is > 0.9.
Tau is defined as: where N is the number of tissues and x i is the expression profile component normalized by the maximal component value.

FEVER web server implementation
FEVER is implemented in R (shiny) and is compatible with most web browsers (Google Chrome, Apple Safari, Mozilla Firefox and Microsoft Edge) across the major operating systems (Windows, MacOS and Linux).FEVER is hosted by the sk8 project which is based on technologies such as GitLab and CI / CD, Docker and Kubernetes.Using this website requires session cookies.

Results
FEVER provides a simple interface to investigate the evolution of gene expression across fish species.To enter a query gene, FEVER supplies a tool to search for genes of interest across 17 species (Figure 1 ) based on their gene ID, gene symbol or their longest transcript ID (Figure 2 A).Based on this query gene, FEVER displays the reconstructed gene tree which provides orthologous and paralogous relationships across species through speciation and duplication events over evolution (Figure 2 B).In addition, FEVER provides the expression (as zscores on a heatmap) of all genes belonging to the tree across species and tissues (Figure 2 C).The tissue specificity (Tau) of all genes, and the tissue a gene is specific to (when applicable) are displayed on the heatmap (Figure 2 C).For an easier comparison across species and tissues, bar plots showing expression values (TPM) are also shown (Figure 2 D).Gene trees and expression values can be downloaded from the web service.Gene trees, nucleotide and protein alignments have been deposited to Zenodo ( https:// doi.org/ 10.5281/ zenodo.10844755 ).

Case study
To illustrate the features of FEVER, we provide and interpret the results of FEVER using MCAM as a query gene that is involved in cell adhesion and in cohesion of the endothelial monolayer at intercellular junctions in vascular tissue.FEVER revealed that this gene was duplicated by the teleost-specific WGD (Ts3R) and maintained in two copies ( MC AMA and MC AMB ) in most teleost species (Figure 2 B).
At the expression level, we note that MCAMA is specific to brain and MCAMB to heart in most teleost fish species, while the pre-Ts3R MCAM gene was likely to be particularly expressed in heart (Figure 2 C), as inferred from MCAM expression in bowfin and mammals (from external data: https://apps.kaessmannlab.org/evodevoapp/ ).Additionally, MCAMA and MCAMB were duplicated once again by the salmonid-specific WGD (Ss4R) resulting in four MCAM copies in the brown trout, for instance.Interestingly, one post-Ss4R MCAMA paralog became specific to testis while the other one kept the pre-Ss4R MCAMA brain specificity, and the two post-Ss4R MCAMB paralogs maintained the pre-Ss4R MCAMB heart specificity.This example illustrates that gene tissue specificity may change following WGD events and that FEVER is a useful tool to assess gene evolutionary dynamics and identify potential sub / neo-functionalization.Expression values (TPM) can be easily compared across tissues as exemplified (Figure 2 D) in bowfin (one MCAM copy) and zebrafish (two MCAM copies, mcama and mcamb ).

Discussion
The FEVER web server supplies a simple interface to rapidly investigate the evolution of genes of interest across fish species following gene duplication events.It is noteworthy that one should be careful when inferring gene loss over evolution, because apparent gene loss can also be the result of missing genes in the annotations (e.g.mitochondrial genes) or inconsistent gene trees (e.g.fast-evolving genes).In this framework, it was previously observed that a fraction of gene trees reconstructed with similar methods in teleost fishes was inconsistent with their syntenic context resulting in misplaced WGD events ( 22 ).Consequently, SCORPiOs ( 22 ) was developed to correct those inconsistencies using synteny-based approaches.Thus, we plan to integrate SCORPiOs to our gene tree reconstruction pipeline in the next update.Furthermore, caution should be exercised when making cross-species comparisons of expression data in embryos and gonads, as synchronizing developmental stages and maturation status across species poses particular challenges.Future updates should also enable the user to blast any sequence of interest to investigate the evolution of the closest genes present in our dataset.Finally, we plan to progressively add additional species and expression data from other sources for teleost and non-teleost species.

Figure 1 .
Figure1.Species used for the reconstruction of gene trees.The Phylogenetic tree and divergence times are based on TimeTree ( 19 ) (v5) ( http://www.timetree.org).Red dots indicate Whole Genome Duplication events (WGD).The bar plot shows the number of protein-coding genes of each species considered for the reconstruction of gene trees.Blue and red bars correspond to genes in trees and not in trees, respectively.

Figure 2 .
Figure 2. The FEVER web server interface and result page based on the query gene MCAM (Melanoma cell adhesion molecule).( A ) FEVER provides an easy -to-use interf ace to search for genes of interest across 17 species.( B ) FEVER displa y s the gene tree containing the query gene with duplication and speciation nodes annotated, in red and blue, respectively.( C ) The heatmap shows gene expression across species and tissues as well as tissue-specificity inde x es (Tau) and the tissue the genes are specific to, when applicable. ( D ) The bar plots show expression values (TPM) across tissues and organs for genes belonging to the same tree of the query gene.Expression values for MCAM in bowfin and mcama / mcamb in zebrafish are shown as a case study.