Gene Copy Number Variation Does Not Reflect Structure or Environmental Selection in Two Recently Diverged California Populations of Suillus brevipes

Gene copy number variation across individuals has been shown to track population structure and be a source of adaptive genetic variation with significant fitness impacts. In this study, we report opposite results for both predictions based on the analysis of gene copy number variants (CNVs) of Suillus brevipes, a mycorrhizal fungus adapted to coastal and montane habitats in California. In order to assess whether gene copy number variation mirrored population structure and selection in this species, we investigated two previously studied locally adapted populations showing a highly differentiated genomic region encompassing a gene predicted to confer salt tolerance. In addition, we examined whether copy number in the genes related to salt homeostasis was differentiated between the two populations. Although we found many instances of CNV regions across the genomes of S. brevipes individuals, we also found CNVs did not recover population structure and known salt-tolerance-related genes were not under selection across the coastal population. Our results contrast with predictions of CNVs matching single-nucleotide polymorphism divergence and showed CNVs of genes for salt homeostasis are not under selection in S. brevipes.

fungi copy-number variants Suillus salt tolerance mycorrhiza Copy-number variants (CNVs) are genes or genomic regions of varying size with different numbers of copies across individuals. Copy-number differences can arise through genomic rearrangements (such as non-allelic homologous recombination and nonhomologous end-joining (Gu et al. 2008;Conrad et al. 2010;Lupski and Stankiewicz 2005), stalled replication forks (Hull et al. 2017), and DNA circularization (Møller et al. 2015). CNVs can be population-specific and have a large impact on population dynamics. For example, selection has been shown to efficiently remove harmful CNVs in large populations of malaria parasites and less efficiently in smaller populations (Cheeseman et al. 2015). Population structure and selection histories can be recovered in population-level analyses of CNVs. In humans, CNVs were shown to reflect the population structure detected from single-nucleotide polymorphisms (SNPs) divergence (Jakobsson et al. 2008;Conrad et al. 2010;Redon et al. 2006). Even though CNVs mirror population structure patterns recovered from single-nucleotide polymorphisms (SNPs) in many organisms including humans, in the grass Brachipodium distachyon (Gordon et al. 2017), in fungal pathogens Zymoseptoria tritici and Aspergillus fumigatus (Hartmann and Croll 2017;Zhao and Gibbons 2018), in high-altitude adapted Chinese indigenous cattle (Zhang et al. 2020), this is not necessarily always the case. For example, a study on an American lobster population across an inlet temperature gradient showing no population structure based on SNPs found CNVs to be better predictors of selection than SNPs and a mismatch between the pattern recovered by SNPs and CNVs (Dorant et al. 2020).
Discovering CNVs across individuals is important because variants causing drastic changes in phenotypes can be adaptive and have major evolutionary consequences. Selection can modulate protein dosage (where increased copy-number leads to increased amounts of gene product in the cell) or produce functional divergence across gene copies resulting in differentially adaptive phenotypes (Schrider and Hahn 2010;Kondrashov 2012). CNVs, and in particular gene multiplications, are known to be important in environmental adaptation . Gene copy number variation is known to be involved in adaptation to environments characterized by extreme temperatures, toxic heavy metals, and high salinity (Kondrashov 2012;Oh et al. 2012). For example, the expansion of the antifreeze glycoprotein gene family allows cod to withstand freezing temperatures in the Antarctic (Chen et al. 2008) and increased numbers in heat-tolerance related gene copies confer heat tolerance in strains of Escherichia coli (Riehle et al. 2001). The evolution of tolerance to heavy metals can also occur through CNVs , including increased number of copies and expression levels of metal homeostasis genes in plants (Craciun et al. 2012;Talke et al. 2006) and fungi (Ruytinx et al. 2019;Bazzicalupo et al. 2020). Salt tolerance can also be associated with ion transporter gene CNVs and has been documented in several plants (Huang et al. 2008;Wu et al. 2012) and in the yeast Saccharomyces cerevisiae, where increased levels of ploidy and expression are associated with tolerance to high salinity (Dhar et al. 2011).
Fungi lack easily measurable phenotypes, making studying fungal environmental adaptation challenging (Branco et al. 2017;Branco 2019). One common approach to detect genomic signatures of selection is to use both genome scans for selection and correlations between genomes and environment, underlying the importance of specific genes in the ability of fungi to persist in specific environments. For the most part, this approach has been most often applied to SNP data (Ellison et al. 2011;Gladieux et al. 2015;Branco et al. 2015;Branco et al. 2017). However, given the evolutionary importance of CNVs, genome scans targeting copy number variation have also been useful in detecting adaptive variants in fungi such as wine strains of the yeast Saccharomyces cerevisiae, the human pathogen Cryptococcus gattii, and the mycorrhizal fungus Suillus luteus (Steenwyk and Rokas 2017;Steenwyk and Rokas 2018;Steenwyk et al. 2016;Bazzicalupo et al. 2020).
Here, we investigate whether gene copy number variation mirrors the patterns of population structure and selection unveiled by genome scans on SNP divergence in Suillus brevipes, a widespread mycorrhizal fungus associated with pines. Previous work documented differentiated populations from coastal and montane regions in California adapted to distinct climatic and salinity conditions (Branco et al. 2015;Branco et al. 2017), with SNP differentiation revealing a strong signature of selection near a salt-tolerance related gene (predicted as Nha-1-like, an Na + /H + antiporter). Furthermore, no SNP variation was found in this gene across coastal individuals strongly suggesting a selective sweep for adaptive alleles allowing colonization of saline environments in coastal California. Nha-1 homologs have been reported as important for salt tolerance in other organisms, including increased Nha-1 expression levels in Arabidopsis thaliana allowing survival in high sodium concentrations (Apse et al. 1999) and Nha-1 mutations conveying salt tolerance in yeast (Nass et al. 1997). In addition, there is evidence for salt tolerant yeast strains having multiple copies of Nha-1 (Prior et al. 1996). We hypothesized that CNVs in the two S. brevipes populations mirror both population structure and selection patterns recovered from SNP analyses, and are prevalent in specific genomic regions related to salt homeostasis. Specifically, we investigated whether CNVs recover the previously described population differentiation between coastal and montane California and if the most differentiated CNVs involved salt homeostasis genes, including the Nha-1-like gene. We found fungal individuals to show many CNVs, however, these CNVs failed to recapitulate the S. brevipes population structure and signatures of selection predicted by SNP analyses.

Gene copy-number estimates per individual and population gene copy-number differentiation
We estimated gene copy number variation from the processed Illumina reads of 27 S. brevipes whole genomes from coastal (11 individuals) and montane (16 individuals) California populations previously analyzed in Branco et al. (2015). Read data are available in GenBank (see Branco et al. (2015) for SRA codes of individuals and reference genome). To detect copy-number variation across populations, we estimated the number of copies for 250 base pair windows in each individual using the software Control-FreeC (Barillot et al. 2011). Suillus brevipes is a dikaryotic fungus which means two haploid genomes inhabit the same cell, we therefore assumed diploidy of our samples and followed Steenwyk et al. (2016) for other software settings. Control-FreeC aligns the reads to the reference genome, normalizes them assuming diploidy, and estimates deviations based on the read count. Only genes present in the reference are estimated for copy-number deviations and genes absent from the sample are counted as losses. We define 'gains' as regions estimated to have more than two copies and 'losses' or 'absences' as regions that are estimated to have one or zero copies of a gene. This method based on normalized read counts does not allow us to truly assess if the copies of a particular gene are found in the same genomic locations or other parts of the genome. To identify copy number differentiated regions we calculated the amount of copy variance using V ST following Steenwyk et al. (2016) (code with V ST formula is available from https://github.com/abazzical/sbrevCNV). Given the reference genome is rather fragmented (Branco et al. 2015), we report results for only the 100 largest scaffolds of the genome for all subsequent analyses. The total reference genome length is 52,222,250 bp and scaffolds 1-100 represent 54.7% (28,583,750 bp). As scaffolds in reference genomes are numbered according to their size, we observed an increase in the V ST values as the size of the scaffold decreased possibly indicating some systematic error in copy number estimate when the reference scaffold is more fragmented. We therefore decided to only use the first 100 scaffolds (Fig S1-2).
Clustering individuals based on whole-genome copynumber variants (CNVs) To investigate whether copy number variation recovered the population differentiation found in (Branco et al. 2017;Branco et al. 2015), we performed Principal Coordinates Analyses (PCoA) based on gene copy number estimates. The goal was to assess if individuals from the coastal and montane populations clustered separately as seen with the SNP data (Branco et al. 2015;Branco et al. 2017). To this end, we used matrices of estimates of per-individual gene copy numbers in 250 bp windows across all samples using the Bedtools software (Quinlan and Hall 2010;Quinlan 2014). We compiled four distance matrices by filtering the raw matrix in different ways. One matrix included all CNVs of coding DNA sequence (CDS) regions, a second included only the top 1% differentiated CDSs. We also filtered the matrices based on the number of copies: a third matrix only included regions that had at least one 'gain', and a fourth matrix included only regions that had at least one zero value. All distance matrices and PCoAs were generated using the ape package (Paradis and Schliep 2018) in R (R Core Team 2014). To compare montane and coastal CNV size and numbers we used a Wilcoxon rank sum test. Additionally, to visualize how similar individuals are to one another, we used the distance matrix used in the PCoA of the all the genes with CNVs to build a cluster-based tree using default settings the R package ape (Paradis and Schliep 2018). We were then able to compare our clustering tree with NJ tree based on SNPs from (Branco et al. 2015).
Gene ontology enrichment analyses of the top 1% diverged CNVs To assess whether the top CNV differentiated genes were enriched in genes involved in salt tolerance, we conducted a gene ontology (GO) enrichment analysis on the top 1% diverged CNVs between the two S. brevipes populations. We used GO terms based on the annotated reference genome of S. brevipes (Branco et al. 2015) and used the App ClueGO (Kirilovsky et al. 2009) in Cytoscape (Shannon et al. 2003) to identify significant GO terms and describe gene functions. We established significance of GO term enrichment by implementing the two-sided hypergeometric test and Benjamini-Hochberg corrected p-values using the ClueGO statistics.

Genes of interest related to salt homeostasis
To identify genes related to salt tolerance we used the Suillus brevipes reference genome (Branco et al. 2015) housed in the MycoCosm database (Grigoriev et al. 2013). We searched the database for key words 'Na + ' and 'sodium'. Gene copy numbers were recovered using Bedtools (Quinlan 2014), and we built heatmaps depicting gene copy numbers in R (R Core Team 2014) using gplots (Warnes et al. 2009) and Rcolorbrewer (Neuwirth 2014).
We performed Fisher's exact tests (R Core Team 2014) to investigate whether there was a significant difference in salt homeostasis-related gene copy number between populations. For each individual, copy numbers of selected genes were scored as a gain, absence, or neutral based on the reference genome. Fisher exact tests were performed with an expected odds ratio of 1:1 and a confidence interval of 95%. We used a Bonferroni correction for multiple testing. All code is available at https://github.com/abazzical/sbrevCNV.

Gene CNVs do not reflect SNP pattern in Suillus brevipes
Contrary to previous SNP results showing clear population differentiation (Branco et al. 2015), there were no significant differences in CNV number or CNV size between montane and coastal populations (Wilcoxon rank sum test P . 0.05). The vast majority of CNVs were losses across the whole genome (Table 1). Close to 10% of CNVs were located in predicted genes across all isolates. Consistent with findings in other fungi (Hartmann and Croll 2017), we found much fewer copy number gains compared to losses both across the genome and in predicted genes. The losses were on average much larger sections of the genome compared to the size of the duplications/multiplications (Table 1, Figure 1).
We also found CNVs do not show the same pattern as SNPs and failed to recover population structure across coastal and montane individuals of S. brevipes in California. This result was clear from both the whole genome CNV PCoA ( Figure 2) and a cluster analysis ( Fig  S4). Neither analyses showed differentiation between the two populations as the individuals did not group based on their population of origin. We found variation in the number of copies of genes in different individuals both spread across the genome and in the salt tolerance related genes (see section below). When looking at only at regions showing a 'gain' or an 'absence' we found the ordinations to be indistinguishable which suggests that there is very little overall difference in gene losses and duplications (Fig S5 and S6).
The top differentiated CNV regions are not enriched in genes related to salt homeostasis Gene copy number variation was spread across the genome, with some regions showing much more differentiation compared to others n■  Figure 1 Suillus brevipes copy number variant sizes (bp) for the whole genome and for genes predicted to be involved in salt tolerance. A and B are sizes across the whole genome, C and D are sizes for predicted genes. A and C are the montane population and B and D are the coastal population.
( Figure 3). The PCoA based on the top 1% CNV differentiated genes showed a the coastal population clustered in a tight group which could be explained by stronger selection or recent bottleneck of the coastal population reported in (Branco et al. 2015) (Fig. S1). The top 1% 250bp windows overlapped with 37,537 annotations from the JGI reference genome and comprised 16,472 'CDS', 18,362 'exon', 1,347 'stop codon', and 1,355 'start codon' regions. Contradicting our expectations, the top 1% of the CNVs were not enriched in genes known to be related to salt tolerance. The gene ontology enrichment analysis on top 1% diverged genes revealed 'ubiquitin' as the sole significantly represented GO term (Table S1). Ubiquitin is present across eukaryotes and regulates proteins in the cell and it is unclear how this gene function relates to environmental or salt adaptation in the two S. brevipes populations (Hershko and Ciechanover 1998).
Absence of copy-number differentiation in salt-tolerance specific genes between Suillus brevipes coastal and montane populations We found 83 genes involved in salt homeostasis across the S. brevipes genome, including 'sodium/calcium exchanger proteins', 'Na + /dicarboxylate, Na + /tricarboxylate and phosphate transporters', and 'Sodium/hydrogen exchanger family'. Based on Fisher's Exact tests, none of these genes showed significant differences in CNVs between coastal and montane populations (Table S2), suggesting salt homeostasis genes are not under selection for different copy numbers in the two populations. However, as shown in the heatmap in Figure 4, there is variation in copy number across individuals of both populations. The salt-related gene identified in Branco et al. (2015) (Nha-1-like gene; ProteinID 1095265) showed a range of copy numbers across individuals ( Figure 4). However, there was no consistent pattern between the two populations, indicating absence of selection on the Nha-1-like gene copy number.

DISCUSSION
We found CNVs did not recover patterns of population structure and selection previously documented in Suillus brevipes. This species is known to be genetically differentiated across coastal and montane California and to show genomic signatures of local adaption to high soil salinity (Branco et al. 2015). Contrary to our expectations, we found no evidence of copy-number variation differentiation both across the genome and in salt tolerance related genes. However, individuals in both populations showed CNVs across their genomes  which is consistent with copy-number variation in the ribosomal internal transcribed spacer for individuals of the same lichenized fungal species (Bradshaw et al. 2020).
Given the previous findings in S. brevipes, we hypothesized CNVs involved in salt homeostasis to be under selection, particularly the Nha-1 like gene that was previously documented to be under positive selection. Individuals within a population share demographic history that impacts both SNP and gene copy number variation, making it is reasonable to expect both types of genetic variation display similar patterns of population structure and selection. In fact, such examples are well documented in human populations both across the whole genome (Jakobsson et al. 2008) and in specific genes associated with innate immune function (Ballana et al. 2007). There are also cases of congruence between SNP and CNV patterns in fungi, including in the plant pathogen Zymoseptoria tritici, where gene CNVs and SNP divergence show the same worldwide population structure (Hartmann and Croll 2017;Zhao and Gibbons 2018).
Both biological and technical factors likely contributed for the absence of concordance between CNVs and SNPs in S. brevipes, including recent population divergence, weak selection, and a highly fragmented reference genome. Despite being clearly distinct and isolated by discontinuous pine host presence in the California Central Valley, the coastal and montane S. brevipes populations share a fair amount of ancestral genetic variation indicating a rather recent split (Branco et al., 2015) that might have not allowed for CNV differentiation to establish yet. In addition, weak natural selection can also potentially contribute to the absence of CNV signal. Even though there is clear evidence of local adaptation in S. brevipes from California (Branco et al. 2015(Branco et al. , 2017, the previously detected salt tolerance genomic signature was restricted to a single region including one gene known to be involved in salt homeostasis, Figure 4 Gene copy number variation in Suillus brevipes genes involved in salt homeostasis. The heatmap shows gene copy number variation across 83 genes involved. The color scale from burgundy to blues going through yellows corresponds to numbers from '0' (burgundy) to '12' (blue). (Individuals 1-18 belong to the montane population and 20-34 to the coastal population). The Nha-like gene is indicated with an arrow ','. The protein IDs of each gene are listed on the righthand side of the plot.
suggesting soil salinity might not be a very strong selective pressure for this species. The coastal and montane S. brevipes populations were sampled from locations with distinct soil chemistries (Peay et al. 2010) and the absence of genetic diversity in the Nha-1 like gene in the coastal population was a good indication of soil salinity being a selective pressure for S. brevipes (Branco et al. 2015). However, it is possible that the coastal Californian soil salt content is not sufficiently high to lead to genetic signatures of adaptation on CNVs. Conversely, selection on salt-related CNVs could also be weak because the coastal variant of the sodium transport gene detected by Branco et al. (2015) is sufficient in providing an advantage in coastal environments. Finally, S. brevipes is not a model organism and has limited available genomic resources that could have prevented recovering a stronger CNV signal. The reference genome for this species is highly fragmented (assembled in 1550 scaffolds) and its annotation incomplete and based on sequence similarity-based gene predictions (Kuo et al. 2014). These resources allow for unveiling considerable aspects of S. brevipes biology but might have prevented the detection of key genes involved in environmental adaptation. We investigated 52% of the genome (included in the 100 largest scaffolds) and there might be adaptive CNVs related to salt tolerance in the remainder of the genome that remained undetected.
In conclusion, our results complement findings from previous studies on environmental adaptation in the mycorrhizal fungus S. brevipes (Branco et al. 2017;Branco et al. 2015) and more broadly contribute a previously unreported scenario of CNVs not reflecting patterns of population structure or selection displayed by SNP variation.

ACKNOWLEDGMENTS
We would like to thank Jennifer Lachowiec for helpful comments on the manuscript.