A Chromosome-Level Assembly of Blunt Snout Bream (Megalobrama amblycephala) Genome Reveals an Expansion of Olfactory Receptor Genes in Freshwater Fish

Abstract The number of olfactory receptor genes (ORs), which are responsible for detecting diverse odor molecules varies extensively among mammals as a result of frequent gene gains and losses that contribute to olfactory specialization. However, how OR expansions/contractions in fish are influenced by habitat and feeding habit and which OR subfamilies are important in each ecological niche is unknown. Here, we report a major OR expansion in a freshwater herbivorous fish, Megalobrama amblycephala, using a highly contiguous, chromosome-level assembly. We evaluate the possible contribution of OR expansion to habitat and feeding specialization by comparing the OR repertoire in 28 phylogenetically and ecologically diverse teleosts. In total, we analyzed > 4,000 ORs including 3,253 intact, 122 truncated, and 913 pseudogenes. The number of intact ORs is highly variable ranging from 20 to 279. We estimate that the most recent common ancestor of Osteichthyes had 62 intact ORs, which declined in most lineages except the freshwater Otophysa clade that has a substantial expansion in subfamily β and ε ORs. Across teleosts, we found a strong association between duplications of β and ε ORs and freshwater habitat. Nearly, all ORs were expressed in the olfactory epithelium (OE) in three tested fish species. Specifically, all the expanded β and ε ORs were highly expressed in OE of M. amblycephala. Together, we provide molecular and functional evidence for how OR repertoires in fish have undergone gain and loss with respect to ecological factors and highlight the role of β and ε OR in freshwater adaptation.


Introduction
Olfaction is an important physiological function in animals because of its role in foraging, mate selection and avoiding predators or poisonous agents (Hara 1975;Su et al. 2009;Baz aes et al. 2013;Hughes et al. 2018). The vertebrate olfactory system is able to detect and discriminate various odor molecules in the environment using the multigene family of olfactory receptor genes (ORs). Vertebrate ORs belong to the family of G-protein-coupled receptors that are composed of seven a-helical transmembrane (TM) regions with conserved motifs (Mombaerts 1999). OR genes are predominantly expressed in sensory neurons of the main olfactory epithelium (OE) in nasal cavities both in mammals (Vassar et al. 1993;van der Linden et al. 2018) and fish (Churcher et al. 2015;Cong et al. 2019). The diversity and large number of OR genes facilitate the discrimination of a diverse range of environmental odor particles and are thought to be critical for adaptation to local environmental conditions. The number of intact OR genes varies extensively among species of placental mammals ranging from $300 in orangutan to $2,000 in African elephants, and most species have a substantial number of OR pseudogenes (Niimura et al. 2014). The dramatic differences in OR repertoire and gene numbers among vertebrates result from frequent gene gains and losses through duplication and pseudogenization during evolution (Vandewege et al. 2016;Hughes et al. 2018;Niimura et al. 2018). As a consequence, such dynamic evolution of OR repertoire likely facilitates adaptation to different ecological niches (e.g., feeding ecology and habitat) (Hayden et al. 2010(Hayden et al. , 2014Vandewege et al. 2016;Hughes et al. 2018). In mammals, aquatic and terrestrial species differ in total number of OR genes per gene family (Hayden et al. 2010(Hayden et al. , 2014 and the OR repertoire has not only expanded but also contracted in association with changes to local environmental conditions (Hughes et al. 2018). Together, these results suggest that OR gene expansion has played an important role in ecological adaptation in mammals, but comparatively less is known about the role of water-soluble OR in ecological adaptation in fish.
OR genes in vertebrates are classified into nine subfamilies, a, b, c, d, e, f, g, h, and j (Niimura and Nei 2005;Niimura 2009). Most mammalian OR genes belonging to subfamily a and k, known as "mammalian-like" genes, are expressed in airfilled medial diverticulum and responsible for detecting airborne odorants, whereas fish express subfamily d, e, f, and g, referred to as "fish-like" genes, are expressed in water-filled lateral diverticulum and associated with detecting watersoluble odorants. These water-soluble odorants mainly include amino acids, bile acids, gonadal steroids, and prostaglandins, which are nonvolatile (Hara and Zhang 1996;Cong et al. 2019). Because subfamily b OR genes were present both in aquatic and terrestrial vertebrates, they are recognized as both water soluble and airborne odorants (Niimura 2009).
Fish represent one of the largest vertebrate groups with at least 20,000 known species, colonizing tropical, temperate, and polar waters as well as virtually all fresh-water environments. Like other vertebrates, the fish olfactory system is critical for behavior related to feeding, reproduction, predator avoidance and odorant-oriented migration (Laberge and Hara 2001;Hamdani and Døving 2007). The recent availability of a wide taxonomic breadth of fish genome sequences offers an opportunity to explore the evolution of OR repertoires in this group. To date, most previous fish OR studies only focused on the gene identification, whereas the evolutionary dynamics of fish OR gene families and their role in adaptation to different ecological niches are completely unknown.
We tested the hypothesis that the evolution of fish OR gene repertoire has been influenced by habitat and feeding habit and identified which OR gene subfamilies are important for each ecological niche. We generated a high-quality reference genome assembly of a commercially important herbivorous fish, the blunt snout bream (Megalobrama amblycephala), to explore the OR gene family repertoire across 28 phylogenetic and ecologically diverse fish species.

Genome Sequencing Assembly and Annotation
We produced a high quality reference genome using a combination of PacBio long reads (supplementary table 1 We used our previously published high-density linkage map (Liu et al. 2017) to order scaffolds and generate a chromosome-level assembly. Of 868 scaffolds in the assembly, 650 anchored onto 24 chromosomes with a total length of 1,078 Mb, representing 97.2% of the assembled genome sequences (table 1; supplementary table 4, Supplementary Material online). Although M. amblycephala shared a most recent common ancestor (MRCA) with Danio rerio $54 Ma (Liu et al. 2017), the genome of M. amblycephala retained strong collinearity with that of D. rerio ( fig. 1). No large interchromosomal translocation was found between the 25 chromosomes of D. rerio and the 24 chromosomes of M. amblycephala. One notable difference is that M. amblycephala Chr2 corresponds to D. rerio Chr10 and 22 ( fig. 1). We used Maker (Holt and Yandell 2011) to annotate the novel reference genome and quantify OR genes copies in M. amblycephala. In doing so, we discovered a notable expansion of OR genes that are mainly located on Chr16 and 18 ( fig. 1). To evaluate whether this expansion was associated with ecological factors, we performed comparative analysis across other fish species with variation in habitat.

Identification of or Repertoires Across Fish Genomes
We identified the olfactory genes in the genome assemblies of M. amblycephala and 27 other fish species for which deepcoverage genome sequences are available (supplementary table 5, Supplementary Material online). These 28 species span 19 different fish orders and include seven other Otophysa species in addition to M. amblycephala ( fig. 2A). Following an extensive homology search and manual curation, we identified 4,288 OR genes and classified them into three categories, intact genes (putatively functional genes, n ¼ 3,253), truncated genes (n ¼ 122), and pseudogenes (n ¼ 913, fig. 2A). The proportion of pseudogenes among fish species was highly variable and ranged from 5.3% in D. rerio to 37.8% in Xiphophorus maculatus. We also found extensive variation in the size of OR repertoires and the number of intact genes ranged from 20 in Mola to 279 in Lates calcarifer while the reference, M. amblycephala, contained 223 intact OR genes ( fig. 2A). The number of OR genes varies in a lineage-specific manner, for example, as in Tetraodontiformes, M. mola and Takifugu rubripes which have low numbers of OR (20 and 61, respectively) whereas in Cypriniformes we found more than 120 OR genes in each species.

OR Gene Family Phylogeny and Classification
To examine the evolutionary relationships and classify the subfamilies of OR genes, we constructed a Maximum Likelihood (ML) tree using 3,253 amino acid sequences from 28 fish genomes (supplementary fig. 1, Supplementary Material online). Here, we restricted our analysis to only include intact genes because most pseudogenes contained deletions and truncated genes were much shorter than the intact genes. The OR genes clearly separated into nine a priori subfamilies a, b, c, d, e, f, g, h, and j (Niimura 2009) and the major clades of the tree were supported by high bootstrap values (more than 70% in 1,000 replicates). The d subfamily is Expansion of OR Genes in FW Fish . doi:10.1093/molbev/msab152 MBE the largest subfamily accounting for more than 50% of the total number of identified ORs followed by subfamily f and g. OR genes generally form tandem arrays that are highly conserved across distantly related species (supplementary fig. 2  Genome of M. amblycephala sequenced in this study is highlighted in red. Habitat and feeding habits are depicted for each species. Bars represent the total OR numbers. (B) The number of OR genes from each subfamily across 28 fish species. Red, yellow and blue bars represent intact genes, truncated genes and pseudogenes, respectively. "Water," "Air," and "Water/Air" represent the detection of water-soluble, airborne and both water and airborne odorants, respectively.

Gains and Losses of OR Genes in Fish during Evolution
To investigate the evolutionary dynamics in the number of OR genes in teleost, we estimated the OR repertoire size in ancestral species and calculated the numbers of gene gains and losses for each branch during the evolution and speciation of the 28 fish species. The results showed that gains and losses of OR genes have occurred frequently in each taxonomic lineage ( fig. 3A). Two species with similar numbers of OR genes may have very different OR gene repertoires. For example, X. maculatus and Oryzias latipes both have 61 OR genes at present, whereas their common ancestor was estimated to have 32 ( fig. 3A). Similarly, in Silurus meridionalis and Ictalurus punctatus, each species has $90 ORs, but only $60 is shared between them. The MRCA of Osteichthyes was estimated to have had 62 intact ancestral OR genes with all subfamilies represented. Several major gain and loss events happened, including the loss of 10 OR genes in the Teleostei lineage and 26 gained in the Otophysa clade when compared with their MRCA, Neopterygii and Clupeocephala lineage, respectively. These 26 gained OR genes belong to subfamily b, e, f, and g ( fig. 3A).
In mammals and birds, the number of subfamily a and c subfamilies of OR genes are preferentially expanded ( fig. 3B), whereas subfamily d, e, f, and g were completely lost (Niimura 2009). Amphibians retain nearly all OR subfamilies except f which are sensitive to both water-soluble and airborne odorants (Niimura 2009), consistent with their aquatic and terrestrial lifecycles. The L. oculatus genome also contains all subfamilies of OR genes.

Phylogenetic Generalized Least-Squares Regression Analysis
To investigate whether OR repertoire size is related to the ecological factors, we performed a phylogenetic generalized least squares (PGLS) regression analyses. We used two ecological niche factors as predictor variables for this analysis:

MBE
Including habitat (marine, freshwater [FW] and both marine and FW) and feeding habit (carnivorous, omnivorous, and herbivorous) (supplementary note 1, Supplementary Material online). Although the habitat was not a significant predictor of the total number of intact OR genes, we found a strong effect of habitat on the number of subfamily b (Pagel's 4A). However, we did not find a significant association between the number of OR repertoires and feeding habits (supplementary fig. 3A, Supplementary Material online).
To further explore the difference of OR repertoires among specified niches groups, one-way ANOVA analysis was performed ( fig. 4B). The results indicated that the numbers of subfamily b (P ¼ 0.002), e (P ¼ 0.0008), and f (P ¼ 0.05) OR genes in FW species were significantly higher than that in marine fish species. Additionally, the number of subfamily b (P ¼ 0.02) and e (P ¼ 0.04) genes in FW fish were also significantly higher than in fish living in both FW and marine water. For feeding habits, only the number of b OR genes in the herbivorous species was significantly higher than in The majority of fish species included in this study have one or two copies of b OR genes, whereas all Otophysa species show an expansion of the b subfamily (more than 10 on average). Specifically, multiple highly supported clades of b OR genes suggest recurrent and lineage-specific expansions for Otophysa b ORs (fig. 5A). Similarly, genes in subfamily e also independently duplicated multiple times in the Otophysa lineage ( fig. 5B). To test whether these duplicated copies are under selection, we used an adaptive branch-site random effects likelihood (aBSREL) model to explore the presence/absence of selection by measuring rates of nonsynonymous to synonymous (dN/dS ¼ x) substitutions in genes. We found that four and three branches show strong

Expression Patterns of OR Genes
To determine which of the OR genes are being used, we performed RNA-seq on the OE, whole brain (B), and olfactory bulb (OB) of adult M. amblycephala and analyzed previously reported OE transcriptome data from adult D. rerio and of different life stages of A. anguilla ( fig. 6). Our results demonstrate that the majority of candidate OR genes are expressed in the OE ( fig. 6). In M. amblycephala, OR genes were not expressed in brain and OB, whereas 207 of 223 intact ORs were expressed in OE. In D. rerio, 153 of 159 intact ORs were expressed in OE. In A. anguilla, all the OR genes were expressed in OE except one f subfamily. Importantly, the expression level of the majority ORs of A. anguilla were higher in sexually mature male (SM) than in sexually immature FW and seawater (SW) OE. These results suggest that most intact ORs were expressed and putatively functional in the OE of these species.
To evaluate whether there are differences in OR expression levels among the three species, we plotted phylogenetic trees generated from all intact OR sequences overlaid with the normalized values corresponding to their expression ( fig.  7A). Half of eight identified 1:1:1 OR orthologs were expressed at significantly higher levels (P < 0.01) in A. anguilla than in M. amblycephala OE (fig. 7B). Among the 1:1:n and 1:n:n orthologs, nearly all the duplicated ORs are highly expressed ( fig.  7C). To further validate the transcriptome results and verify  7D). All the tested b and e OR genes were highly expressed in OE of male and female M. amblycephala ( fig. 7D), whereas these were not expressed in muscle tissue, OB and brain except b2 e7, e13 with slight expression in male M. amblycephala brain. The expression trend of these ORs revealed using qRT-PCR analysis was consistent with that detected in the OE transcriptome analysis.

Discussion
Sensory systems play a crucial role in the aquatic lifestyle of fish in feeding, migration, spawning, and defense (Hara 1975). Gene expansion in ORs is thought to have played an important role in the transition from aquatic to terrestrial living in mammals (Niimura 2009;Hayden et al. 2010), but the role of ORs expansion among aquatic species has not been previously studied using comparative evolutionary analysis. Using a high-quality genome assembly of M. amblycephala and data from 27 taxonomically diverse fish species, we found that the numbers of intact OR genes have changed extensively during fish evolution. We find evidence for an expansion of OR genes in FW species, in particular as regards members of the b and e subfamilies, which most likely reflects ecological adaptation.
Genomic surveys have revealed that ORs represent the largest gene family in mammals (Zhang and Firestein 2002) and some studies indicate that OR repertoires vary widely among vertebrates even between closely related taxa (Matsui et al. 2010;Niimura et al. 2014Niimura et al. , 2018. The variation in the number of intact OR genes is >10-fold among the 28 teleosts examined in this study, ranging from 20 in M. mola to 279 in L. calcarifer (supplementary table 6, Supplementary Material online). This result is consistent with previous analyses of the total number of OR genes from the whole genome data of T.  (Ao et al. 2015). The MRCA between fish and tetrapods had at least nine subfamilies of OR genes (Niimura and Nei 2005) and at least six (d, e, f, g, b, and c) out of the seven classified subfamilies of intact OR genes are retained in fish ( fig. 3; supplementary table 6, Supplementary Material online). The pattern is opposite in mammals, where only two subfamilies (a and c) are retained, but their copy numbers are greatly expanded (Niimura 2009).
Most fish possess a well-developed olfactory system and have a relatively large OB and OE. But why do the number of intact OR genes vary so much among fish species? We predicted that OR genes might vary according to different habitats and lifestyles. Gene duplications followed by subfunctionalization is one possible mechanism for genetic adaptation ( Within gene clusters, different subfamilies of OR genes are largely contiguous and located close to each other and with the same transcriptional orientation, suggesting tandem duplication as a mechanism of gene expansion, consistent with a previous fish study (Alioto and Ngai 2005).
Studies on reptiles, birds and mammals have suggested that expansion and contraction of OR genes are related to major changes in habitat and lifestyle (Niimura 2009;Hayden et al. 2010Hayden et al. , 2014Khan et al. 2015;Vandewege et al. 2016;Hughes et al. 2018). For example, previous study explored OR diversity in 2 reptiles and 48 birds, which indicated species MBE and lineage-specific variation in subfamily of OR genes associated with aquatic and terrestrial adaptations (Khan et al. 2015). Similarly, a study on OR genes in sauropsida suggests that adaption related to life history evolution have shaped the unique OR repertoires in different members of this subfamily (Vandewege et al. 2016). Our taxonomically diverse sampling allowed us to assess the relationship between habitat and feeding behavior in relation to OR gene expansion in fish. Transitions to different aquatic environments may be facilitated by the olfactory sensory organs and OR genes if olfaction is critical to survival. For example, elimination of visual function resulted in increased olfactory capabilities in the blind cave fish, A. mexicanus (Blin et al. 2018). In the present study, we observed a substantial expansion of subfamily b OR genes in A. mexicanus genome, which may compensate to some extent their loss of visual information (supplementary table 6, Supplementary Material online).
Our phylogenetically controlled environmental association analysis ( fig. 4) indicated that although the total number of intact OR genes was unaffected by environmental factors, there is a strong association between the number of subfamily b, e, and f OR genes and FW habitat. Most teleosts possess one or two b OR genes, whereas fish in the Otophysa lineage possess more than 10 on average ( fig. 5A; supplementary table 6, Supplementary Material online). Why might b OR genes have specifically expanded in FW Otophysa species? A possible explanation for this observation is that subfamily b OR genes are important for detecting both water soluble and airborne odorants (Niimura 2009). The large number of b and e OR genes in the FW Otophysa lineage may allow them to detect and discriminate a wide range of odorant stimuli that are important for survival in FW. Previous research has suggested that expansion or contraction of gene families can be a random process driven by drift and/or by natural selection. Earlier studies highlighted that positive selection shaped the diversification and variation of the OR gene family in ants (Engsontia et al. 2015), bees (Brand and Ram ırez 2017), and fish (Alioto and Ngai 2005;Hussain et al. 2009). However, in this study, no signatures of positive selection were found in the b and e ORs showing copy number expansion ( fig. 5).
Calculating total mRNA abundance of each OR permitted us to assess which receptors are functionally expressed in the OE transcriptomes of M. amblycephala and D. rerio (FW fish), as well as A. anguilla (both living in FW and SW). We present evidence that almost all the intact OR genes belonging to different subfamilies are expressed and putatively functional in OE of these species (fig. 6). Similar OR expression patterns have also been observed in the mouse (Ibarra-Soria et al. 2014) and human (Olender et al. 2016). Although the number of intact ORs (108) in A. anguilla is lower than in D. rerio (159 intact ORs) and in M. amblycephala (223 intact ORs), the expression levels of four out of eight 1:1:1 OR orthologs in SW stage A. anguilla were significantly higher than that in FW M. amblycephala ( fig. 7B). Previous work describing differential expression of OR genes in OE of wild Atlantic salmon (Johnstone et al. 2011) and European eel (Churcher et al. 2015) suggests that the regulation of these genes is associated with different physiological states and responses to environmental cues.
In summary, we present the first phylogenetic comparative analysis of OR genes expansion and contraction in fish. We show that expansions of subfamily b and e OR genes have occurred in the FW Otophysa lineage and that b and e OR genes expansion is consistently associated with FW habitats in other species. Gene expression analyses indicated that nearly all the intact OR genes including expanded subfamily b and e OR genes in FW fish are expressed in olfactory organs, strongly supporting their functional importance, which have potentially facilitated FW adaptation.

Genome Sequencing and Assembly
Genomic DNA was isolated from whole blood of a double haploid fish as described in our previous study (Liu et al. 2017). The genomic DNA was fragmented to 20 kb and sequenced with the Pacific Biosystems RSII platform. The genome assembly was performed using a combination of sequencing technologies: PacBio RS II reads, Illumina pairedend reads (PE), and Illumina mate-pair reads (MP). Briefly, high-quality Illumina PE reads were separately assembled into Illumina contigs using Platanus (v1.2.4) (Kajitani et al. 2014). Next, low-quality PacBio subreads with a read length shorter than 1,000 bp or a quality score lower than 0.8 were filtered out. The remaining PacBio subreads were errorcorrected with MECAT (v 1.0) (Xiao et al. 2017). Then, the error-corrected PacBio reads and Illumina contigs were combined to perform a hybrid assembly using the DBG2OLC (Ye et al. 2016) pipeline. Illumina reads (65Â coverage) were mapped to the contigs using BWA-aln. This alignment was then used to correct the assembly with Pilon 1.22 (Walker et al. 2014). A total of 69.15 Gb of Illumina MP data (approximately 61Â), with an insert size varying between 2 and 20 kb, was used to scaffold the assembly with SSPACE (v3.0) (Boetzer et al. 2011). Then, the gaps were filled with GapFiller (v1.10) (Nadalin et al. 2012) and PBjelly (v1.2) (English et al. 2012). To further estimate the completeness of the assembly, Core Eukaryotic Genes Mapping Approach was performed using benchmarking universal single-copy orthologs (BUSCO) (SimAo et al. 2015).

Chromosome-Level Assembly and Synteny Analyses between M. amblycephala and D. rerio
A RAD genetic linkage map of M. amblycephala (Liu et al. 2017) composed of 14,648 SNP markers was used to organize and orientate the scaffolds into chromosome-sized sequences. The RAD tags corresponding to these 14,648 SNP markers were mapped onto the genome using BWA-aln. Then, we placed and oriented the scaffolds and contigs relative to each other with the physical and genetic positions of the mapped markers. To identify syntenic blocks, the protein sequences from M. amblycephala and D. rerio were searched against each other using BLASTp (E < 1eÀ5). The results were subjected to MCScan (-a, -e : 1e-5, -u : 1, -s : 5) to determine syntenic blocks.

Genome Annotation
We used a combination of homology-based, de novo and RNA-seq annotation methods to predict genes in the assembled genome. For the sequence similarity-based prediction, protein sequences from D. rerio, G. aculeatus, L. crocea, L. calcarifer, L. oculatus, O. niloticus, O. latipes, and S. grahami were mapped to M. amblycephala genome using genBlastA v1.0.1 (She et al. 2009) with an E-value threshold of 1eÀ5. Subsequently, homologous genome sequences were aligned against the matched proteins to define gene models using GeneWise (v2.2.0) (Birney et al. 2004). For the de novo gene predictions, AUGUSTUS (v2.5.5) (Stanke et al. 2006) was used to identify candidate protein-encoding genes in the masked genome with self-trained model parameters. Then unigene sequences from nine transcriptoms sequenced this study were used to search transcripts region using BLAT v. 36 and PASA (Haas et al. 2003). Finally, the homology-based, de novo-derived and transcript gene sets were merged to generate a high confidence gene set using Maker 2.31.10 (Holt and Yandell 2011). To assign functions to the gene models, we performed BLASTP, with an E-value threshold of 10 À5 , using the NR, KOG, SwissProt, and TrEMBL databases (Uniprot release 2017-09). The gene motifs and domains were determined using InterProScan (version 5.16) (Zdobnov and Apweiler 2001) against public protein databases. Functional annotation was carried out using Blast2GO (Conesa et al. 2005).

Identification of OR Genes from Fish Genome Sequences
To identify OR genes from 28 complete fishes genome (supplementary table 5, Supplementary Material online), we followed a bioinformatics pipeline similar to previously described methods (Niimura and Nei 2007;Niimura 2009) with some modifications. Briefly, we used a first-round of TBLASTN searches (Altschul et al. 1997) with a cutoff E-value of 1eÀ5 against each fish genome sequence using a set of known functional OR genes sequences from D. rerio, O. latipes, O. niloticus, L. chalumnae, L. oculatus, L. vexillifer, and T. rubripes as queries. We then predicted the structure of the OR genes using the blast-hit sequence with GeneWise (Birney et al. 2004), extending in both 3' and 5' directions along the genome sequences. Hits shorter than 200 bp were discarded. All best-hits from the genome sequence were extracted. We classified OR genes with interrupting stop codons or frameshifts were as pseudogenes. We classified a truncated gene as those with a partially intact sequence encoding a part of an OR and were validated by alignment to functional genes using the program MUSCLE (Edgar 2004). The longest coding sequences from the start (ATG) codon to a stop codon with an uninterrupted open reading frame and seven transmembrane domains were considered as intact OR genes. To classify OR genes into different subfamilies, we used BLASTP to intact ORs into putative subfamilies based on the classification of zebrafish and pufferfish OR genes (Niimura and Nei 2005). A phylogenetic tree was constructed using the ML method in MEGA 5.10 (Tamura et al. 2011) with 1,000 replicates to verify and correct the putative BLASTP-based assignments. Finally, we aligned all identified intact OR genes found in this study using MUSCLE. The alignment was manually corrected and used to construct a phylogenetic tree by FastTree2 (Price et al. 2010). The phylogenetic tree was displayed and labeled using Interactive Tree Of Life (iTOL) v4 (Letunic and Bork 2019). All the identified intact and truncated OR nucleic and translated protein sequences in this study are shown in supplementary notes 4 and 5, Supplementary Material online.
Collinear Analysis of OR Genes Among M. amblycephala, D. rerio, and C. idellus Syntenic blocks between M. amblycephala and D. rerio, and between M. amblycephala and C. idellus were firstly identified using BLASTp and MCScan following methods described above. Then, the OR genes from these three fish were positioned back on their genome to determine which genes belong to which syntenic region. Finally, variant sites among OR genes located in syntenic region were identified by multiple sequence alignment using MUSCLE.

Estimation of OR Genes Gain and Loss Events
We used CAFE 0 (De Bie et al. 2006) to reconstruct the OR repertoires and calculate copy numbers for ancestral OR genes in each lineage using all the intact ORs identified from fish genomes. Firstly, a data file containing the sizes of all the OR gene subfamilies were prepared. Divergence times for each node in the CAFE 0 analyses were estimated from the Liu et al. . doi:10.1093/molbev/msab152 MBE phylogenetic tree ( fig. 2A). The CAFE 0 method employs a random birth and death model to estimate gene gains and losses in each lineage. The global parameter k described both the gene birth (k) and death (l ¼ Àk) rate across all branches in the tree for all gene subfamilies was estimated using ML. A conditional P-value was calculated for each gene family, and families with conditional P-values of <0.05 were considered to have a notable gain or loss. Molecular Evolution and Selection Analyses of Subfamily b and e OR Genes All intact subfamily b and e OR genes from 28 fish genomes (b, n ¼ 164 and e, n ¼ 161) were aligned to prepare the alignments. The OR genes trees were constructed using the ML method in MEGA 5.10. We used the aBSREL approach (Smith et al. 2015) in order to test for evidence of episodic positive selection implemented in Datamonkey (Weaver et al. 2018). In aBSREL, a likelihood ratio test was performed to compare the null model (x ¼ 1) against the alternative, where the branch was undergoing some form of selection (x 6 ¼ 1). We used a threshold of P < 0.05 (after correction for multiple testing) to infer positive selection at the marked branches.

PGLS Regression Analysis
We use phylogenetic generalized least-squares regressions (PGLSs) to establish the relationships between the numbers of OR genes and ecological factors (feeding habit and habitat) of each fish species while controlling for phylogenetic effects. Feeding habit was categorized as carnivorous, omnivorous or herbivorous and habitat was categorized as marine, FW , or both marine and FW (supplementary note 1, Supplementary Material online). The PGLS analyses were performed using the R packages "caper" (Orme 2013). The input phylogenetic tree was from our present study ( fig. 2A). Here, we used Pagel's k (Pagel 1999) with the value ranging from 0 (phylogenetic independence) to 1 (phylogenetic dependence) to estimate the degree of phylogenetic signal of each trait.

RNA-Seq Analysis
The experimental procedures were approved by the Animal Care and Use Committee of Huazhong Agricultural University. The OE, OB, and whole brain (B) were collected from the adult male M. amblycephala and immediately frozen in liquid nitrogen. All samples were prepared in triplicate. Total RNA was isolated from each sample with RNAiso Plus (TaKaRa, Dalian, China). A total of nine RNA-seq libraries were constructed and sequenced in BGI by BGISEQ-500 platform (Shenzhen, China). For comparison, the raw data from OE of adult D. rerio and different life stages, FW, SW, and SM of A. anguilla were downloaded for further analyses. The detailed sampling information is shown in supplementary table 8, Supplementary Material online.
All raw data were assessed using fastp v0.20.0 (Chen et al. 2018). After removing the adapters and poly N or low-quality sequences (Q < 15), the remainder were termed as clean reads. High quality clean reads from each sample were separately aligned to M. amblycephala (this study), D. rerio (GRCz11.99, Ensembl) and A. anguilla (GCA_000695075.1, NCBI) genome using Hitsat2 v2.0.4 (Pertea et al. 2016). Estimated mapped read counts and transcript lengths were used to calculate transcripts per million (TPM) using RSEM (v1.2.25) with the default settings (Li and Dewey 2011). To reduce the variation among the three fish species, the DESeq2 package (Love et al. 2014) was also used to estimate the size factors and dispersion and to generate a normalized counts matrix using the 4,404 single-copy orthologs. TPM values were subsequently transformed to log 2 (TPM þ 1).

qRT-PCR Analysis
A total of 8 representative b and e OR genes in four tissues (muscle, OE, OB, and B) of adult male and female M. amblycephala were detected by using qRT-PCR. The OE, OB, B, and muscle tissues were collected from the adult male (n ¼ 9) and female M. amblycephala (n ¼ 9) and immediately frozen in liquid nitrogen. Tissues from three individuals were pooled to extract RNA and three independent biological replicates for male and female were separately prepared. cDNA was synthesized from 1 lg of total RNA using a reverse transcriptase kit from TaKaRa Biochemicals (TaKaRa, Dalian, China). Primers were designed using Primer 5.0 software (supplementary table 9, Supplementary Material online). b-actin served as an internal normalization control for qRT-PCR analysis. PCR reactions contain 1 ll cDNA, 1 ll forward and reverse primers, 10 ll SYBR Green PCR Master Mix (TaKaRa, Dalian, China). A qRT-PCR was performed using a Roter-gene Q (Qiagen, Hilden, Germany) with one cycle of predenaturation at 95 C for 45 s, followed by 40 cycles of amplification at 95 C for 15 s, 60 C for 15 s, and 72 C for 30 s.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.

Data Availability
The data underlying this article (raw data of whole-genome sequencing, genome assembly and RNA-seq data) have been deposited at NCBI BioProject database under Bioproject ID PRJNA343584.