Genomic Architecture of Rapid Parallel Adaptation to Fresh Water in a Wild Fish

Abstract Rapid adaptation to novel environments may drive changes in genomic regions through natural selection. However, the genetic architecture underlying these adaptive changes is still poorly understood. Using population genomic approaches, we investigated the genomic architecture that underlies rapid parallel adaptation of Coilia nasus to fresh water by comparing four freshwater-resident populations with their ancestral anadromous population. Linkage disequilibrium network analysis and population genetic analyses revealed two putative large chromosome inversions on LG6 and LG22, which were enriched for outlier loci and exhibited parallel association with freshwater adaptation. Drastic frequency shifts and elevated genetic differentiation were observed for the two chromosome inversions among populations, suggesting that both inversions would undergo divergent selection between anadromous and resident ecotypes. Enrichment analysis of genes within chromosome inversions showed significant enrichment of genes involved in metabolic process, immunoregulation, growth, maturation, osmoregulation, and so forth, which probably underlay differences in morphology, physiology and behavior between the anadromous and freshwater-resident forms. The availability of beneficial standing genetic variation, large optimum shift between marine and freshwater habitats, and high efficiency of selection with large population size could lead to the observed rapid parallel adaptive genomic change. We propose that chromosomal inversions might have played an important role during the evolution of rapid parallel ecological divergence in the face of environmental heterogeneity in C. nasus. Our study provides insights into the genomic basis of rapid adaptation of complex traits in novel habitats and highlights the importance of structural genomic variants in analyses of ecological adaptation.


Introduction
Understanding the mechanisms that facilitate adaptation of populations to changing environments is a long-standing central goal of evolutionary biology (Savolainen, et al. 2013). Although populations can evolve rapidly in response to sudden environmental changes (Lescak, et al. 2015;Reid, et al. 2016), genetic architecture of rapid adaptation is still not well understood for complex traits controlled by a large number of genetic and environmental influences (Jain and Stephan 2017). While empirical population genetics supports rapid adaptation as a result of selective sweep with large frequency changes of novel mutations at single or few loci, quantitative genetics presumes that phenotypic adaptation results from subtle allele frequency shifts at many loci (Pritchard and Di Rienzo 2010).
Owing to recent advances in genomic technologies, emerging evidence suggests that structural genomic variants (SVs) of diverse forms are taxonomically ubiquitous and play a major role in a multitude of ecological and evolutionary processes . SVs are predicted to be favored under adaptation and are expected to change the evolutionary trajectory of polygenic traits under selection because they involve many genes acting together like supergenes of large effect, rather than many loci of small effect . Mounting evidence shows that chromosomal inversions are the most frequent SVs associated with adaptive phenotypes and have a pervasive role in ecoevolutionary processes, from mating systems, environmental adaptation, and reproductive isolation to speciation (Wellenreuther and Bernatchez 2018).
Evolutionary transitions from marine to freshwater environments are important in generating phyletic diversity within fishes (Vega and Wiens 2012). Transitions from marine to freshwater habitats constitute dramatic shifts between 'adaptive zones', which represent considerable adaptive challenges because many environmental conditions vary (e.g., salinity, oxygen, pH, food, predators, pathogens, symbionts, etc.) (Lee and Bell 1999). Adaption to fresh water likely involves multiple highly polygenic traits that contribute to the complex adaptive phenotype (Brennan, et al. 2018). The Japanese grenadier anchovy, Coilia nasus, exists in two distinct life history forms (Yuan, et al. 1976). One is an ancestral anadromous form, widely distributed in coastal and estuarial regions of the Northwest Pacific, which migrates upstream into fresh water in the spring for breeding. The other is a landlocked freshwater Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa290/5955839 by guest on 06 November 2020 resident form spending its whole life in the affiliated lakes in lower reaches of the Yangtze River in China, such as Taihu Lake, Chaohu Lake, and Hongze Lake, and serving as the most dominant species in the lake ecosystem (Yuan, et al. 1976;Cheng, et al. 2019). A recent study demonstrates that the freshwater resident form is not genetically distinguishable from the anadromous form, and had invaded freshwater environment after the formation of the lakes in late Holocene (Cheng, et al. 2019). As a result, the freshwater resident and anadromous C. nasus in the Yangtze River watershed are generally each other's closest relatives. Because of the varied selection regimes in freshwater habitats, the derived freshwater resident populations have consistently acquired a specific set of morphological, physiological and behavioral traits allowing them to reside in fresh water, indicating rapid phenotypic changes in adaptation to fresh water (Yuan, et al. 1976). Besides the complex combination of behavioral traits associated with migration, the anadromous and resident forms differ in an array of morphological traits, including number of vertebrae, number of soft rays of anal fin, eye size, shape and size of liver, ovary shape, and body color (Yuan, et al. 1976). Furthermore, helminth community, muscle lipid content, and feeding behavior at spawning season are also different between the two forms (Yuan, et al. 1976;Li and Wang 2014). These parallel phenotypic changes may have the same underlying genetic basis or may involve different genetic changes. Therefore, this natural system provides a compelling opportunity to infer genetic architecture of rapid phenotypic adaptation by comparing sets of derived freshwater populations with their anadromous ancestors.
Here, we seek to investigate genomic architecture that underlies rapid parallel adaptation to fresh water in C. nasus by comparing four freshwater resident populations with their anadromous ancestors in the Yangtze River system. We performed joint analysis of SNPs and SVs in a population genomic framework to characterize how differentiation is structured across the genome, which would allow us to assess whether the same chromosomal features are implicated in divergence between the anadromous population and the four freshwater resident populations from different lakes. The source of adaptive genetic variants for freshwater adaptation was also illustrated. The results will provide a general understanding of the genetics of rapid adaptation of complex phenotypes to changing environment in wild populations. Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa290/5955839 by guest on 06 November 2020

Genetic variation, Ne estimation, population structure and demographic history
The published chromosome-level genome assembly for C. nasus (Xu, et al. 2020) contains a total of 81,894 EcoRI restriction sites (one per 10 Kb) across 24 linkage groups (chromosomes). The RAD sequencing covered about 110 Mb (14%) of the total genome with a size of 812 Mb ( Supplementary   Fig. S1). After comparison with the reference genome, 6,542,393 SNPs were called. A total of 123,792 SNPs were retained after the filtering procedure, which were distributed evenly across the genome with a mean depth of 20× (Supplementary Fig. S2 and Table S1) and covered ~65% of the whole genome when considering a moderate linkage disequilibrium (LD) extent of 10 Kb. There was an average of 5,158 SNPs per chromosome with a minimum of 3,204 SNPs on LG20 and a maximum of 8,071 SNPs on LG14. Most (85%) of the SNPs were within 10 Kb from their nearest neighbors ( Supplementary Fig. S3). The observed and expected heterozygosity were similar among populations, with HO ranged from 0.21 to 0.23 and HE ranged from 0.20 to 0.24 (Supplementary Table S2). The Ne estimate for the Yangtze River Estuary anadromous population was 12,753 (95% CI: 8,735-23,614), which was much larger than those in the four freshwater resident populations. In addition, the Ne estimates of Taihu Lake population (3,494; 95% CI: 3,163-3,904) and Chaohu Lake population (3,492;95% CI: 3,850) were a magnitude higher than those of Hongze Lake population (171; 95% CI: 170-172) and Luoma Lake population (160; 95% CI: 160-161) (Supplementary Table S2).
The genome-wide fixation indexes (FST) between the anadromous population and the four freshwater resident populations were generally low but significant, with an average value of 0.07 for all SNPs and 0.03 for neutral SNPs, indicating that the freshwater populations were recently derived from the anadromous one (Supplementary Table S3). Admixture results indicated that genetic variation was strongly partitioned by geography with highest support for K = 4 (Hongze Lake and Luoma Lake formed one cluster and other three populations formed distinct clusters) ( Fig. 2A and Supplementary   Fig. S4), which was consistent with neighbor-joining (NJ) tree based on FST (Fig. 2C). Principal component analysis (PCA) based on the first two PCs indicated that individuals from Hongze Lake and Luoma Lake were distinct from the other populations, and individuals from Chaohu Lake and Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa290/5955839 by guest on 06 November 2020 Taihu Lake also formed two closely related clusters, while individuals from Yangtze River Estuary population were overlapped with those from Chaohu Lake and Taihu Lake (Fig. 2B).
The demographic analysis indicated that scenario 4 was the best model, in which the freshwater resident populations were derived independently from their common ancestral anadromous population ( Supplementary Fig. S5). The posterior probability of scenario 4 was significantly higher than those of the other scenarios (posterior probability = 1.00, 95% CI: 0.99-1.00; Supplementary Table S4). The Hongze Lake population was derived from the Yangtze River Estuary population first, which happened about 365 (95% CI: 105-1,100) generations ago. Then the Chaohu Lake population and Taihu Lake population formed in sequence, which happened about 83 (95% CI: 48-120) and 61 (95% CI: 32-96) generations ago, respectively (Supplementary Table S5).

Candidate outlier SNPs and parallel allele frequency shifts
The two methods of outlier detection identified a total of 16,960 outlier loci across four population pairs, of which 2,814 were detected by both methods (Fig. 3A, Supplementary Fig. S6 and Table S6).
Fisher's exact test (FET) identified a total of 5,835 outlier SNPs, of which 1,269 were shared among all four population pairs. Pcadapt detected a total of 13,939 outlier SNPs, of which 1,188 were shared among all population pairs. The number of outlier SNPs detected by both methods in each population pair ranged from 1,568 in Taihu Lake vs. Yangtze River Estuary to 1,849 in Luoma Lake vs. Yangtze River Estuary. At last, a total of 1,147 outlier SNPs detected by both methods were shared among the four population pairs. Most (96%) of the 1,147 candidate outlier SNPs were preferentially distributed on chromosomes LG6 (263) and LG22 (838), clustering into similar genomic islands among population pairs (Fig. 3A). The rest 46 outlier SNPs were located on six chromosomes, with an average of eight SNPs on each chromosome. These results suggested that genomic regions exhibiting signatures of selection were remarkably consistent across multiple, independently derived freshwater populations, indicating that parallel phenotypic evolution in C. nasus may be occurring through extensive parallel genetic evolution.
Large allele frequency shifts were observed for the freshwater favored allele (FWA) of the 1,147 candidate SNPs in all freshwater-anadromous population pairs (Fig. 3B, C). The FWA of most candidate SNPs were nearly fixed in the freshwater populations. The average frequency of the FWA was 0.89 in Taihu Lake and Chaohu Lake, which were higher than in Hongze Lake (0.80) and Luoma Lake (0.74) (Fig. 3B). In contrast, the frequency of FWA in anadromous population was generally low with an average of 0.12 (SD = 0.07). Frequency of the FWA of most candidate outlier SNPs had increased by at least 0.5 in different freshwater populations (94%-99%), with an average frequency increment across all candidate outlier SNPs ranging from 0.62 in Luoma Lake to 0.77 in Chaohu Lake and Taihu Lake (Fig. 3C). Both the frequency and the increment of frequency for the FWA were higher in Taihu Lake and Chaohu Lake than those in Hongze Lake and Luoma Lake, which was consistent with their estimates of Ne.
The FWA for 1,024 of the 1,147 candidate outlier SNPs (89%) occurred as standing genetic variation in the Yangtze River Estuary anadromous population, suggesting that standing variation was the predominant source for adaptation to fresh water. Considering the relatively low frequency of the FWA as standing variations in the anadromous population and the relatively small sample size, it was not unlikely that FWA of the other 11% candidate SNPs could also present as standing variation in the anadromous population. In the anadromous population, frequency of the FWA for the candidate SNPs (average = 0.12, SD = 0.07) was significantly higher than the minor allele frequency (MAF) of the neutral SNPs (average MAF = 0.11, SD = 0.10; Welch two-sample t test P < 2.2e-16; Supplementary   Fig. S7).

Putative chromosome inversion regions related to parallel freshwater adaptation
Linkage disequilibrium network analysis (LDna) identified a total of five single-outlier clusters (SOCs) on different chromosomes, with median LD r 2 values ranging from 0.33 to 1.00, number of loci ranging from 53 to 1,070, and sizes ranging from 1.61 Mb to 31.54 Mb (Table 1). Population genetic analysis of PCA and heterozygosity for the five SOCs suggested that these SOCs corresponded to five putative chromosome inversions which separated the individuals into three groups along PC1 ( Supplementary   Fig. S8). The PCAs of all SNPs in the five chromosome inversions displayed three distinct clusters: the homokaryotypes of reference arrangement, the heterokaryotypes and the homokaryotypes of the alternative arrangement. All chromosome inversions were characterized with high LD, reduced Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa290/5955839 by guest on 06 November 2020 heterozygosity in the homokaryotype, and strong divergence between the inverted and the un-inverted rearrangement. Three of the five putative chromosome inversions on LG3, LG18 and LG19 broadly separated individuals of Hongze Lake and Luoma Lake from the other populations, suggesting their association with geographical population structure. The other two chromosome inversions on LG6 and LG22 broadly separated the freshwater resident and anadromous ecotypes, suggesting their association with anadromous-freshwater adaptive divergence ( Fig. 4A and Supplementary Fig. S8 Table S7). Combined, the two inversion regions in LGs 6 and 22 covered more than 50 Mb (~6% of the genome) and contained more than 1,800 genes (Supplementary Table S7). However, the sizes of the inversions should be treated with caution, as their estimations were highly dependent on the accuracy of the reference genome assembly. The proportions of proper paired reads (defined as the forward read and the reverse read mapped on the same chromosome and with right orientation as well as proper insert size) aligned to the reference genome assembly were generally low for all individuals (~80%, Supplementary Fig.   S9), which further indicated that some putatively wrongly ordered contigs existed on linkage groups.
Thus, the quality of the chromosome-level reference genome for C. nasus is not very high. Moreover, we found a large "gap" in the distribution of outliers on LG22 (Fig. 3A), which was consistent with the identified positions of SNPs in the detected SOC. LD were also high between SNPs from either ends of the gap (median r 2 = 0.80), implying that there might be assembly errors here. We thus excluded this gap from the inversion region on LG22, which resulted in a size of 21.97 Mb. All the following related results were based on the "gap" removed inversion on LG22.
The frequency of the putative chromosome rearrangements on LG6 and LG22 displayed large shifts between the anadromous and freshwater resident populations. For inversion on LG6, all the individuals of the anadromous population (Yangtze River Estuary) were reference homokaryotype (AA), while most of the freshwater individuals were alternative homokaryotype (BB, mean karyotype frequency = 0.78, SD = 0.21) ( Fig. 4C and Supplementary Table S8). In particular, all the individuals of Chaohu Lake were alternative homokaryotype BB. The heterokaryotypic individuals (AB) with both inverted and un-inverted arrangements appeared in three freshwater populations, including Taihu Lake, Hongze Lake and Luoma Lake. For inversion on LG22, 17 of the 21 individuals (81%) of the Yangtze River Estuary population were reference homokaryotype (AA), while most of the freshwater individuals were alternative homokaryotype (BB, mean karyotype frequency = 0.71, SD = 0.22).
Interestingly, three individuals with heterokaryotype AB and one individual with alternative homokaryotype BB were found in the anadromous population ( Fig. 4C), which indicated that the inverted arrangement on LG22 presented as standing genetic variation in the anadromous population.
The large frequency shifts of chromosome rearrangements of both chromosome inversions on LG6 and LG22 between the anadromous and freshwater resident populations further explained the drastic allele frequency shifts for the detected outlier SNPs. The FST between the anadromous population and the four freshwater resident populations calculated with all SNPs located in the chromosome inversion regions on LG6 and LG22 showed elevated differentiation with an average value of 0.37, which was one order of magnitude larger than the FST values based on neutral SNPs (Supplementary Table S9).

Gene annotations of outlier SNPs and GO enrichment analyses of inversion regions
Annotations for 859 of the 1,147 candidate outlier SNPs were retrieved. Of these, 380 SNPs were located in genic regions with 23 in coding sequence (CDS) and 357 in introns, and 479 were within 10 Kb of the closest gene. For the 23 SNPs in CDS of 16 genes, eight were nonsynonymous mutations in six genes (Supplementary Table S10). Gene annotations of outlier SNPs indicated some genes that were possibly involved in adaptation to fresh water. These genes included genes encoding chloride intracellular channel protein, solute carrier family 25 member 47-A-like protein, and potassium channel subfamily K member 10A, solute carrier organic anion transporter family member, zinc transporter 2, potassium voltage-gated channel subfamily H member 5, and sodium bicarbonate cotransporter gene, etc. (Supplementary Table S10).
Gene Ontology (GO) enrichment analyses of genes within the two chromosome inversions in terms of biological process identified a total of 72 and 73 significantly enriched GO terms for inversion regions on LG6 and LG22, respectively (P < 0.01; Fig. 5; Supplementary Table S11 and Table S12).
These enriched terms were involved in diverse biological processes, e.g., osmoregulation, immunoregulation, growth and maturation, locomotion, thermal response and metabolic process, etc.

Discussion
Evolutionary change and adaptive divergence in natural populations can occur very rapidly, responding to strong selection over short ecological timescales (Carroll, et al. 2007). However, the genetic architecture underlying these adaptive changes is still poorly understood. Here, we provide strong evidence of rapid parallel adaptation to fresh water and propose that chromosome inversions form a significant part of adaptive genetic variation. The apparently elevated differentiation observed in the two chromosome inversions indicates their potential role in maintaining and redistributing the adaptive substrate to fuel rapid divergence during freshwater adaptation. These results should provide exciting insights into the genetic mechanism underlying rapid adaptation of complex traits to changing environments in natural populations.
Chromosome inversions can play a key role in adaptive divergence because they protect inverted sequences from recombination in heterokaryotypes, allowing the coinheritance of multiple favorable alleles (Wellenreuther and Bernatchez 2018). Given that suppressed recombination allows mutational differences to accumulate between their variants, chromosomal inversions may create 'genomic islands of divergence', as observed in our study. The two inversions exhibited parallel association with freshwater adaptation, providing strong evidence that adaptation of C. nasus to fresh water in different lakes evolved similarly and independently. Most notably, drastic frequency shifts were observed for both chromosome inversions between the anadromous and resident forms, suggesting they are strongly selected genomic regions in adaptation to freshwater environments. The results for enrichment analysis of genes within the two chromosome inversions in terms of biological process showed significant enrichment of genes involved in osmoregulation, immunoregulation, growth and maturation, locomotion, thermal response, and metabolic process, etc. Changes in these biological processes probably underlie differences in morphology, physiology and behavior between anadromous and freshwater resident forms (Yuan, et al. 1976). Furthermore, the gene annotations associated with outlier SNPs also indicated that genes within the two inversions might play key roles in promoting adaptive differentiation between the two ecotypes. For example, some outlier SNPs were located within or linked to genes possibly involved in osmoregulation, including members of solute carrier family 25, potassium voltage-gated channel subfamily H, potassium channel subfamily K, and genes encoding chloride intracellular channel protein, sodium bicarbonate transporter-like protein, etc. In three-spined sticklebacks (Gasterosteus aculeatus), mutations at genes encoding solute carrier proteins have been associated with annual salinity variation in the Baltic Sea (Guo, et al. 2015). The genes encoding solute carrier proteins also play key roles in promoting adaptive differentiation between the Gilbert Bay and offshore populations of the Atlantic cod (Gadus morhua), which experience different salinities (Sinclair-Waters, et al. 2018). Furthermore, alternative functional exons of a potassium voltage gated channel gene (KCNH4) are found on either side of one chromosomal inversion that has undergone parallel selection after freshwater invasion in three-spined sticklebacks, suggesting marine and freshwater specific isoforms (Jones, et al. 2012). The chloride intracellular channel proteins is one of the major classes of ion channels predominantly localized to intracellular membranes, which is important for maintaining ionic homeostasis of intracellular organelles (Gururaja Rao, et al. 2018).
Given the different salinities experienced by the anadromous and freshwater resident forms of C. nasus, these genes may play crucial roles in adaptive differentiation associated with osmoregulatory adaptation. Although the results of gene annotation are plausible, identifying the true target of selection is difficult within chromosomal rearrangements, where LD is strong (Hoffmann, et al. 2004). Since the two large chromosome inversions contain more than 1,800 genes, it is currently difficult to determine which genes and/or variants contribute to the fitness effects of the two inversions in freshwater adaptation of C. nasus.
Recent studies provide exciting insights into the role of SVs, in particular chromosome inversions, in adaptation and diversification in marine species. Chromosomal inversions have been proved playing an important role in repeated evolution of distinct marine and freshwater three-spined sticklebacks, and in the maintenance of divergent ecotypes during early stages of reproductive isolation (Jones, et al. 2012). In the Atlantic cod, five large putative chromosome inversions have been associated with Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa290/5955839 by guest on 06 November 2020 migratory behavior and geographical distribution, and are likely involved in the maintenance of genomic divergence on both sides of the Atlantic Ocean (Berg, et al. 2017). Furthermore, the chromosomal rearrangement on LG1 that comprises two adjacent inversions is associated with parallel patterns of divergence between migratory and nonmigratory ecotypes of Atlantic cod on both sides of the Atlantic, providing further support for its role in local adaptation (Kirubakaran, et al. 2016;Sinclair-Waters, et al. 2018). Likewise, a large inverted region located on chromosome Omy5 has been linked to life-history strategies of anadromous and resident rainbow trout (Oncorhynchus mykiss) (Pearse, et al. 2014). In the Atlantic herring (Clupea harengus), a 7.8 Mb inversion on chromosome 12 is found both in the East and West Atlantic, which possibly underlies ecological adaptation in relation to the water temperature during gonadal maturation before spawning or the water temperature at spawning/early larval development (Pettersson, et al. 2019). Recently, a putative polymorphic chromosome inversion is detected within the Northwest Atlantic lineage of the capelin (Mallotus villosus), which may facilitate local adaptation to environmental conditions prevailing at spawning sites (Cayuela, et al. 2020). In the marine snail (Littorina saxatilis), several candidate chromosomal inversions are associated with rapid parallel adaptation, which can store shared variation that fuels rapid parallel adaptation to heterogeneous environments (Morales, et al. 2019). The importance of chromosome inversions in ecological and evolutionary processes of these marine species indicates that the analysis of inversions as well as other structural variants should be better integrated in studies pertaining to the molecular basis of adaptation and diversification (Wellenreuther and Bernatchez 2018).
Besides the nature of the genes and genomic regions under selection, the rate of genomic adaptation is also determined by the degree of environmental change, the availability of beneficial mutations, and the efficiency of positive selection. Populations adapt to novel environments in two distinct ways: selection on pre-existing genetic variation and selection on new mutations (Barrett and Schluter 2008). Due to the immediately available beneficial alleles and their higher starting frequencies, adaptation is thought to be faster from standing variation than from new mutation (Innan and Kim 2004). The observation of selection at the same chromosome inversions on LG6 and LG22 and SNPs in independent replicate freshwater populations suggested that the selection resulted from standing genetic variation. Indeed, most FWA of the candidate SNPs and the alternative arrangement of the chromosome inversion on LG22 were found in the anadromous populations. In the anadromous populations, frequency of the FWA were significantly higher than MAF of the other SNPs, indicating that standing genetic variation with higher initial frequencies facilitated rapid adaptation to changing environment. The importance of standing genetic variation as source for adaptation has also been verified in genome-wide adaptation studies of a songbird (Lai, et al. 2019) and three-spined sticklebacks (Jones, et al. 2012). Recent simulation analyses demonstrate that when a population is adapting to a large optimum shift, a substantial allele frequency increase are expected for the selected alleles of loci with large effect size (Stetter, et al. 2018). Transitions from marine to freshwater habitats constitute dramatic shifts between 'adaptive zones' (Lee and Bell 1999), and this large optimum shift could facilitate strong selective strength on loci of large effect size (e.g., chromosome inversions acting as supergenes) in the rapid adaptation to fresh water. Ne is crucial in determining the effectiveness of selection relative to drift. Large populations experience less genetic drift than small ones, which increases the efficiency of selection and the power to detect selected alleles. Indeed, the frequencies of FWA and alternative arrangement of chromosome inversions on LG22 were higher in the two populations with larger Ne (Chaohu Lake and Taihu Lake) than in the two populations with smaller Ne (Hongze Lake and Luoma Lake), suggesting a stronger selective strength in populations with larger Ne. All these conditions combined could lead to the observed strong selection in rapid adaptation of C. nasus to fresh water in different lake populations. However, the loci that showed signals of selection but not in all the four freshwater populations may reflect local adaptation.

Concluding Remarks
Our results demonstrate strong evidence of repeated evolutionary change in response to similar selective environments in the freshwater adaptation of C. nasus, which contribute to the accumulating evidence that SVs form a significant part of adaptive genetic variation ). However, knowledge gaps for the two inversions in our study remain, including position of the inversion breakpoints and the identification of causal genes and/or mutations that facilitate inversion maintenance in the adaptation to fresh water of C. nasus. The two chromosome inversions can serve Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa290/5955839 by guest on 06 November 2020 as a good starting point for characterizing the genetic basis of freshwater adaptations in C. nasus.
Future work based on whole genome resequencing and/or long-read data combined with functional validations and experimental work could allow us to clarify the nature and contribution of the putative chromosomal inversions in freshwater adaptation. In further studies, SVs should be studied in detail in a wider range of non-model organisms, which will provide a more comprehensive understanding of their ecological and evolutionary implications.

Sampling and RAD-tag sequencing
A total of 111 specimens of C. nasus were collected from five populations during 2013-2016, which consisted of 21 individuals from Yangtze River Estuary, 18 individuals from Luoma Lake, 24 individuals from Hongze Lake, 24 individuals from Chaohu Lake, and 24 individuals from Taihu Lake ( Fig. 1 and Table S13). The Yangtze River Estuary samples were referred to as "ancestral anadromous population". Samples from Luoma Lake, Hongze Lake, Chaohu Lake and Taihu Lake were referred to as "derived freshwater resident populations". Muscles and fin clips were collected and immediately preserved in 95% ethanol. Genomic DNA was extracted by the standard phenol-chloroform extraction method. Samples were treated with RNase A to guarantee the DNA isolation without RNA. RAD libraries were prepared using the six bp cutter EcoRI following (Etter and Johnson 2012), and were sequenced (11-13 individuals per sequencing lane) on the Illumina HiSeq 4000 platform at Allwegene Technology Inc. (Beijing, China) using paired-end 150 bp chemistry.

RAD data filtering, SNP genotyping and filtering
Raw sequence reads were quality-filtered as follows: (1) read pairs with adaptors were removed using cutadapt v1.16 (Martin 2011); (2) low quality read pairs were removed using "process_radtags" in Stacks v1.48 (Catchen, et al. 2011) with a sliding-window of 10% and quality score of 13; (3) PCR duplicated read pairs were removed using "clone_filter" in Stacks. The filtered reads were aligned to the chromosome-level assembly of C. nasus (NCBI assembly accession: GCA_007927625.1) using BWA MEM v0.7.15-r1140 (Li 2013 with default parameters. Following the alignment, SNPs were called using a Bayesian approach as implemented in the package SAMtools v1.9 (Li, et al. 2009). High quality SNPs for downstream analyses were filtered using the following criteria: (1) present in at least 12 individuals for each population; (2) depth of coverage ≥ 6; (3) SNP overall quality score ≥ 30 and genotyping score ≥ 15; (4) observed heterozygosity for each population ≤ 0.5; (5) global minor allele frequency (MAF) ≥ 0.05; (6) retain the SNP with local MAF ≥ 0.2 in any of the five populations but failed to meet the criteria of the global MAF ≥ 0.05. SNPs VCF file was converted to other formats using PGDspider v2.1.1.5 (Lischer and Excoffier 2012) or in house Perl script. All the scripts used are available at GitHub.
For population genetic and demographic analyses that assume a set of neutral and unlinked markers, we created a reduced data set as following: first, any SNPs that were identified as outliers by either of the two detection methods (see Methods and Results below) in any population pair were removed; second, we removed all SNPs on the two chromosome inversions (see Methods and Results below); third, only one SNP was kept in a 10 Kb region to remove putative LD using VCFtools (Danecek, et al. 2011). This neutral and LD-pruned data set was hereafter referred to as the neutral data set and, unless otherwise stated, was used as the primary data set for population genetic analyses (PCA, Admixture, NJ tree and FST) and demographic analysis.

Summary statistics and population genetic structure
Genetic statistics, including observed (HO) and expected (HE) heterozygosity of SNPs for each population were estimated using populations in Stacks. FST among all populations/between each pair of populations were calculated with Arlequin v3.5.2.2 and their significance were determined using 10,000 permutations (Excoffier and Lischer 2010). The significance was adjusted using FDR-BY method. Effective population size (Ne) of each population was estimated using the linkage disequilibrium method in NeEstimator v2.1 (Do, et al. 2014).
Population structure was examined and visualized using three approaches. Fist, the model-based program Admixture v1.3.0 (Alexander, et al. 2009) was used to determine population structure. K from 1 to 6 were applied with 10 replicates for each K, the best K was determined using StructureSelector Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa290/5955839 by guest on 06 November 2020 (Li and Liu 2018) using both cross-validation and Puechmaille methods (Puechmaille 2016). Second, principal components analysis (PCA) implemented in the R package "SeqVarTools" (Gogarten, et al. 2020) was performed to visually explore patterns of allelic variation. Third, a NJ tree was constructed based on the pairwise Weir and Cockerham's FST values using the R package "ape" (Paradis and Schliep 2019).

Demographic analysis
In order to test whether the four freshwater resident populations were derived independently from the common anadromous population, we performed the demographic analysis using Approximate Bayesian Computation approach as implemented in DIYABC v2.1.0 (Cornuet, et al. 2014). The Luoma Lake population was excluded from this analysis, as this population was obviously derived from Hongze Lake population. Thus, four populations were retained in the analysis, including one ancestral anadromous population from the Yangtze River Estuary and three derived freshwater resident populations from Taihu Lake, Chaohu Lake and Hongze Lake. For all simulations of the eight demographic scenarios (Supplementary Fig. S5), 2000 SNPs randomly chosen from the neutral unlinked dataset were used. Three replicates were applied for simulations of each demographic scenarios. For simplification, each simulation was performed with 1e6 runs, resulting in a total of 3×8×1e6 runs. Models comparison was performed using the logistic approach in DIYABC (see supporting information Fig. S5 for further parameter details of each model).

Identification of outlier SNPs
Loci that displayed elevated divergence between the anadromous and resident forms were identified by two methods. First, we performed the Fisher's exact test (FET) for the difference in allele frequency on the whole dataset, a P-value < 1e-4 (i.e., -log10(P) > 4) was used as the threshold after examining the Manhattan plot (Fig. 3A), which corresponding to a mean FDR level of 0.005 (max = 0.007).
Second, we used the R package pcadapt (Privé, et al. 2020) to detect SNPs involved in biological adaptation. Pcadapt assumes that candidate markers are outliers with respect to how they are related to population structure based on PCA, which can handle admixed individuals and does not require grouping individuals into populations. We set K = 2 to reflect population structure, and any SNP with a q-value ≤ 0.1 was considered as an outlier. To reduce the possibility of false positives, only the SNPs that were identified as outliers using both methods and shared by all four population pairs were considered as candidate outliers. We defined the allele of which the frequency increased in all the derived freshwater populations as the freshwater favored allele (FWA) for candidate outlier SNPs. We then calculated the frequency of FWA in the ancestral anadromous and the derived freshwater resident populations for the candidate outlier SNPs, as well as their shifts in the four population pairs.

Putative inversion region identification
As clusters of loci with unusually high LD might be generated by chromosomal rearrangements, such as inversions, we detected clusters of SNPs in high LD (outlier clusters, OCs) using the linkage disequilibrium network analysis as implemented in the R package "LDna" (Kemppainen, et al. 2015).
All populations were pooled prior to calculating LD, thereby creating sample admixture LD. The LD (r 2 ) values between pairs of SNPs were calculated individually on each chromosome by VCFtools using the whole dataset with MAF of 0.1. The r 2 matrix was then used for LDna analysis. There are two key parameters that can be set in LDna analysis. The minimum number of edges |E|min, which corresponds to the minimum number of connections among the vertices of a cluster. This parameter controls the minimum number of SNPs within an OC. After several preliminary runs, |E|min = 30 was applied to represent a compromise between detecting clusters large enough to represent chromosomal rearrangements and avoiding noise that result from physical linkage within chromosomes. Another key parameter φ controls the minimum LD threshold above which the median pairwise LD within a cluster is higher than the inter-cluster LD for the group of SNPs to be considered as an OC. This parameter was selected by initializing its value = 2 then increased the value of φ by one in each iteration until no more LD clusters were obtained for at least three times within one chromosome. Only single-outlier clusters (SOCs) with a minimum of 30 SNPs were retained, and clusters with a low median LD (r 2 < 0.3) were also discarded. The sizes of each SOCs were defined as the most extreme positions of the SNPs included in each SOC (except for LG22, see Results section for details). The resulting SOCs were then examined in the downstream analyses. SOCs can arise due to various reasons, such as chromosome inversions, spatial population structure, or local adaptation. Inversions can be identified by detecting groups of genetically distinct individuals that correspond to different karyotypes. The suppressed recombination between arrangements due to inversions will result in the presence of three distinct groups of individuals, which corresponded to three karyotypes (AA: homokaryotypes for the reference arrangement, AB: heterokaryotypes with both reference and alternative arrangements, and BB: homokaryotypes for the alternative inverted arrangement). The second group with karyotype AB should have the highest heterozygosity relative to both the homokaryotypes that should have reduced heterozygosity. Thus, three groups will be separated along the axis of PC1 of PCA based on all SNPs within an LD cluster for inversions. Chromosome inversions were then confirmed by PCA and heterozygosity analyses for each SOC identified by LDna. The rearrangement karyotype of each individual was determined using the "k-means" algorithm as implemented in R (R Core Team 2020). Based on individual karyotype assignments, we calculated the frequency of the reference and alternative arrangement of inversions (allele "A" and "B") and the frequency of the three karyotypes (AA, AB and BB) for each population.

Functional enrichment of chromosome inversions
Gene models and annotations for genes within the two putative chromosome inversion regions were extracted using BEDtools (Quinlan and Hall 2010). In order to identify the significantly enriched Gene Ontology (GO) terms, we firstly aligned the proteins sequences of C. nasus to the protein database of Danio rerio, then we retrieved the related GO terms from the Gene Ontology Annotation (GOA) database (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/). Enrichment analysis for GO terms in each candidate inversion region was performed by the R package "topGO" (Alexa and Rahnenfuhrer 2020) using the "weight01" algorithms.

Data Availability
The sequencing data that support the findings of this study are openly available in the NCBI Sequence Read Archive (SRA) under BioProject Accession No. PRJNA553348. All the customed codes used in this study are available at https://github.com/lyl8086/Coilia_nasus_parallel_evolution.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.       LG6 q q q q q q q q q q q q q q q q q − log 10 (P) B ps://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa290/5955839 by guest on