Genome-wide association analysis and pathway enrichment provide insights into the genetic basis of photosynthetic responses to drought stress in Persian walnut

Abstract Uncovering the genetic basis of photosynthetic trait variation under drought stress is essential for breeding climate-resilient walnut cultivars. To this end, we examined photosynthetic capacity in a diverse panel of 150 walnut families (1500 seedlings) from various agro-climatic zones in their habitats and grown in a common garden experiment. Photosynthetic traits were measured under well-watered (WW), water-stressed (WS) and recovery (WR) conditions. We performed genome-wide association studies (GWAS) using three genomic datasets: genotyping by sequencing data (∼43 K SNPs) on both mother trees (MGBS) and progeny (PGBS) and the Axiom™ Juglans regia 700 K SNP array data (∼295 K SNPs) on mother trees (MArray). We identified 578 unique genomic regions linked with at least one trait in a specific treatment, 874 predicted genes that fell within 20 kb of a significant or suggestive SNP in at least two of the three GWAS datasets (MArray, MGBS, and PGBS), and 67 genes that fell within 20 kb of a significant SNP in all three GWAS datasets. Functional annotation identified several candidate pathways and genes that play crucial roles in photosynthesis, amino acid and carbohydrate metabolism, and signal transduction. Further network analysis identified 15 hub genes under WW, WS and WR conditions including GAPB, PSAN, CRR1, NTRC, DGD1, CYP38, and PETC which are involved in the photosynthetic responses. These findings shed light on possible strategies for improving walnut productivity under drought stress.


Introduction
Walnut (Juglans regia L.) was domesticated in ancient Persia and is today cultivated for its edible nuts throughout temperate and semi-arid regions from Asia to Europe and the Americas [1]. Iran is a prominent centre of diversity for walnut, and ranks third in in-shell walnut production after China and the United States [2]. Walnut production in Central Asia and across the globe is constrained by abiotic stresses, particularly drought, heat, and salinity [3,4]. Drought is likely the most challenging abiotic stress for walnut, with ever-increasing global water scarcity driving large production losses [5][6][7][8]. Therefore, understanding the physiological and molecular mechanisms of drought tolerance in walnuts has become more important world-wide, with more extended drought periods expected in the coming decades. Walnut populations adapted to their native habitats around Iran offer an opportunity to better understand walnut responses to drought stress. So far, most studies conducted in walnut have focused on physiological aspects of drought stress, whereas molecular mechanisms are less well documented [3]. Therefore, a priority task for accelerating walnut improvement is deciphering the molecular genetic basis of drought related traits.
Photosynthesis is a key physiological mechanism involved in adaption to abiotic stresses and regulation of plant development [9]. Under drought conditions, photosynthesis decreases due to stomatal closure, decreased CO 2 availability in the chloroplast, and decreased carboxylation efficiency [10]. Excess light energy during drought-induced stomatal closure can cause serious damage to the plant through generation of reactive oxygen species (ROS) [11]. Plants have evolved diverse protective strategies to cope with excessive light, including non-photochemical quenching (NPQ), antioxidant production, and the regulation of electron transport to moderate ROS formation or detoxify ROS after they form [12]. Because excess light energy absorbed by chlorophyll can either be dissipated through NPQ or re-emitted as fluorescence [11], chlorophyll f luorescence analysis is a powerful tool for tracking f luxes of light absorption by chlorophyll through the electron transport chain [13]. The "OJIP" test represents a fast and non-destructive analysis of polyphasic chlorophyll f luorescence kinetics that has been employed for quick and precise assessment of biophysical aspects of photosynthesis under abiotic stress [14]. The OJIP test, together with gas exchange measurements, have been used successfully to study the photosynthetic apparatus of various tree crops, under abiotic stress conditions, including walnut [7,8]. Despite extensive physiological studies, the genetic basis underlying variation in walnut photosynthesis under both water-stress and re-watering conditions remains largely unknown.
A first-draft reference genome of Persian walnut was released in 2016 (Chandler v1.0 [15]), followed by the development of a high-density Axiom™ J. regia 700 K SNP genotyping array [16] that facilitated advanced genomic studies, including QTL mapping and genome wide association studies (GWAS). These new genomic tools have been extensively used for investigating genomic diversity and association mapping in walnut [17][18][19][20][21]. However, there are few studies on the genetic basis of physiological traits in walnut [3,5]. Recently, two annotated, chromosome-level assemblies of the walnut genome [22,23] have been released, enabling SNP identification at chromosome scale and the application of genomic tools in plant breeding programs. SNP arrays provide reliable and robust markers for a multitude of applications in breeding programs and population genomic studies. However, SNP arrays are species-and population specific and ascertainment bias is one of their main drawbacks [24,25]. SNP array development also requires prior knowledge, and has a relatively high cost [24,25]. On the other hand, genotyping-by-sequencing (GBS) and restriction site associated DNA sequencing (RADseq) approaches have the potential to generate large marker datasets at low cost with minimal ascertainment bias, although large amounts of missing data and heterozygote undercalling are significant drawbacks that can only be partially addressed by imputation [24,25]. Therefore, GBS offers a cost-efficient alternative or complement to high-throughput genotyping arrays for gaining genomic information [25].
GWAS successfully identified many SNPs underlying a wide range of traits in plants. However, for complex or polygenic traits, the statistical power of GWAS for identifying variants of small effect is restricted by the stringent levels set for significance threshold and by insufficient numbers of high-frequency polymorphisms identified in most panels [26,27]. So, many small effect SNP markers are always ignored and most of the genetic variants contributing to the trait remains hidden [28]. Further, since many associated SNPs are noncoding it can be problematic to identify the molecular mechanisms by which they may act. Pathway or gene set enrichment analysis as a complementary method to GWAS can help tackle the aforesaid problems through assessing modules of functionally related genes instead of focusing only on one or a few markers that are most significantly associated [26,27]. Therefore, this approach by pooling information across many genetic variants (SNPs) can identify potentially relevant biological pathways or molecular mechanisms even when individual SNPs fail to reach a stringent significance threshold.
In this study, we take advantage of natural variation in local walnut populations of Iran to investigate the genetic control of photosynthetic-related traits under wellwatered (WW), water-stressed (WS), and water-recovery (WR) conditions, combining GWAS with network and pathway enrichment analyses. Given that the expression of physiological and photosynthetic traits in Persian walnut is under strong genetic control, we hypothesized that locally adapted Persian walnut populations would express different levels of trait plasticity under waterstress (mild and severe) and re-watering conditions. Our main objectives are: (1) to assess natural genetic variation and phenotypic plasticity in photosynthetic traits in a diverse collection of walnut trees under waterstress and re-watering conditions; (2)to uncover genomic regions contributing to photosynthetic trait variation through GWAS; and finally (3) to identify key pathways and hub genes related to photosynthesis under wellwatered, water-stressed and recovery conditions through pathway and network analysis.

Natural variation in photosynthetic traits of Persian walnut populations
From the 150 walnut families collected from major walnut-growing regions of Iran (Table S1), 30 photosynthetic-related traits, largely classified into two main categories (gas exchange and chlorophyll fluorescence measurements), were evaluated in walnut plants grown under control, water-stress, and recovery conditions ( Figure S1). A brief description of calculations for each measured phenotype is provided in Table 1.
High phenotypic variation was observed among families for all the traits measured. Most traits revealed a near normal distribution (Figure 1; Figures S2-3). There was significant (P ≤ 0.001) genotypic variation in both Table 1. Calculations and definitions of water relations, gas-exchange and f luorescence (Strasser et al., 2000(Strasser et al., , 2004    We observed several significant correlations among photosynthetic traits under drought stress (Figure 2A; Figures S4). For instance, drought stress tolerance index (DS) was negatively correlated with the gas exchange parameters g s , C i , T r , and P n but positively correlated with the F V /F M , RWC, and WUE ( Figure 2A). Our results also showed a negative correlation between WUE and C i , g s , and T r . The photosynthetic trait correlations under recovery conditions were almost in line with the DSI of the studied traits. Furthermore, we observed significant regional differentiation in gas exchange  Figure S5). The highest WUE under severe drought stress was found in Markazi and Isfahan populations.
We performed principal component analysis (PCA) on all photosynthetic traits of the 140 families during stress to further explore the key parameters and provide an integrated view of the relationships among traits within populations ( Figure 2B; Figures S4). PCA using DSI (value of trait under WS / value of trait under WW) of the studied traits showed that the first five components (PC1-5) cumulatively explained more than 90% of the total variation for the photosynthetic traits across the panel under severe drought stress ( Figure 2B). The first principal component (PC1), explaining more than 39% of the total variation, was associated with photosynthetic traits; positively with F V /F M (78%), WUE (58%) and RWC (57%), and negatively with Ci (66%), g s (66%), T r (65%), and P n (54%) ( Figure 2B). Since PC1 has a positive and high correlation with water relation parameters and F V /F M index it could be viewed as a quality of plant water status and photosynthetic efficiency under severe drought stress. As a result, PC1 can be considered as a drought tolerant component in our studied walnut panel. On the other hand, since previous studies in plants have shown that positively correlated traits with PC1 (F V /F M , WUE and RWC) have higher heritability than other studied traits, they can be used as biomarkers for selection of drought tolerant genotypes in future studies. The second principal component (PC2), explaining more than 15% of the total variation, was positively associated with phenotypic variation of WUE (57%) ( Figure 2B).

SNP calling and population structure
From the 95 mother trees genotyped using both the SNP array (MArray) and GBS (MGBS), and the 150 families genotyped through GBS (PGBS), 94, 87, and 136 gave good quality data, respectively, and were used as three separate panels for further genomic analysis. The Arrayscored SNPs were categorized into six default groups of Affymetrix Power Tools (APT) according to clustering performance as follows; 1) Poly High Resolution (PHR), 2) Mono High Resolution (MHR), 3) No Minor Homozygote (NMH), 4) Call Rate Below Threshold (CRBT), 5) Off-Target Variant (OTV) and 6) Other as described by Arab et al (2019) [17]. PCA and population structure estimates for each set of panels genotyped through GBS (PGBS and MGBS; Figure 2C and D) divided our panels into four main clusters based on their geographical locations. Data from the Axiom J. regia 700 K SNP array (Arab et al., 2020) also classified mother trees into four main groups. These results confirm that our walnut panels (MArray, MGBS and PGBS) comprise mainly four genetic clusters. Therefore, the optimal number of genetic groups was chosen as four for association mapping studies to control the family structure. LD decayed (shown by r [2] < 0.2) within 10 kb across the genome as described by Arab et al. (2020) [3].  Table 1 for the definition of measured traits.

Genome wide association study
After filtering for minor allele frequency (MAF > 5%) and missing rate (< 10%), we obtained three SNP panels including 295 685 polymorphisms (MArray), 40 828 SNPs (MGBS), and 43 607 SNPs (PGBS). For ease of reading and understanding, we divided the GWAS results into six categories, based on the correspondence of the genotyping approach and the studied trait results. Also, for each of the categories, we classified the results into 5 distinct groups based on the experimental conditions (WW, WS and WR) and phenotypic plasticity of traits (DSI = WS/WW * 100; DRI = WR/WW * 100; see Methods) ( Table 2; Table S6).
Given the significant (α/n) and suggestive (1/n) thresholds (where n is marker number; see Methods), we identified 578 and 1543 unique SNPs, respectively, for the studied phenotypic traits ( Table 2, Table S6, Supplementary data sets S1-S3). Our results found a total of 198 (34%), 228 (40%) and 152 (26%) significant associations for at least one photosynthetic-related trait under all conditions using the MArray, MGBS and PGBS datasets, respectively (Table 2; Figure 4). We also identified a total of 544 (35%), 524 (34%) and 481 (31%) suggestive SNPs associated with all photosynthetic traits under all conditions through the MArray, MGBS and PGBS datasets, respectively (Table S6; Figure S6). One and five of the suggestive SNPs identified were in common between the MArray and MGBS, and the MGBS and PGBS datasets, respectively. These results indicate that the strong significance of these marker-trait associations identified using two different genotyping methods.  Table 2. Summary of significant marker-trait associations identified by GWAS analysis using two approaches (FarmCPU and MLMM) for all the photosynthetic traits across six categories (A-F) in well-water (WW), water-stress (WS), water-recovery (WR) conditions and for drought stress index (DSI) and drought recovery index (DRI) of traits as a relative measure  Supplementary data set S4). We also found 59, 47, 16, 24, and 30 SNPs associated with gas exchange parameters under WW, WS, and WR conditions and phenotypic plasticity of traits (DSI and DRI), respectively. We observed only a few significant trait-locus associations across experiments, indicating different responses of walnut to drought stress and re-watering. In addition, we detected 1699 suggestive SNPs associated with gas exchange (n = 1104) and chlorophyll f luorescence (n = 565) phenotypes (Supplementary data set S4). The highest number of photosynthetic-associated SNPs was identified under WW (508), followed by WS (407), DSI (221), DRI (300), and WR (263). More details are summarized in Tables S6-S8, and Supplementary data set S5.
Significant associations were identified on all the walnut chromosomes. The greatest number of associations was found on chromosome 7 (60 SNPs) and the lowest number of associations was found on chromosome 14 (17 SNPs). We also found SNPs that were associated with multiple traits. A total of 11, 28, and 113 significant (p < 0.05/n) SNPs were associated with more than three, two, and one trait, respectively. Among these, the marker AX-170754326 on chromosome 2 was simultaneously associated with thirteen traits related to the chlorophyll f luorescence. We also identified a significant SNP (S11_15758875) located on chromosome 11 associated with seven traits related to chlorophyll fluorescence. Using the suggestive threshold (p < 1/n), 33, 47 and 95 SNPs were associated with more than five, four and three traits, respectively. In particular, the SNP AX-170754326 was simultaneously associated with twenty traits related to chlorophyll fluorescence. Also, the locus S4_5195821 on chromosome 4 was associated with fourteen chlorophyll fluorescence related traits. Our results are in line with the quantitative (multigenic) nature of drought tolerance and the strong correlation among the studied photosynthetic traits.
By further lowering the P-values threshold to 9.95.0 × 10 −5 (-log 10 P = 5), we also observed clusters of linked SNPs associated to a single trait. For example, twenty-three, twenty-three, twenty-two, nineteen and twelve suggestive SNPs on chromosome 4 were associated with DRI of F V , PC2, ϕ Do , F V/ F M and F M/ F 0 , respectively. Also, under normal condition, thirty and twentynine suggestive SNPs on chromosome 4 were identified for the CE and g s traits. For DSI of PC1, F V/ F M , F M/ F 0 , ϕ Do , C i and RWC were identified twenty-six, twenty-four, twenty-three, seventeen, fifteen and fifteen suggestive SNPs on chromosomes 1, 2, 7, 2, 5 and 8 respectively. Under water stress condition, eighteen, fifteen, fourteen, eleven, and ten suggestive SNPs on chromosome 2 were identified for the F M/ F 0 , ABS/RC, PC1, F i, and F M respectively. On the other hand, forty-eight, forty-seven, thirtynine suggestive SNPs on chromosome 8 were linked to

Candidate gene identification for significant SNPs
Candidate genes underlying each measured trait were selected, based primarily on the significant and suggestive SNPs within the gene or in the f lanking regions (Tables 3-4). A comprehensive list of identified genes is provided in Supplemental data sets S6-S7. Our BLASTX results showed out of 578 and 1543 significant and suggestive SNPs identified by GWAS, 67 (11%) and 204 (13%) SNPs were located inside the gene, respectively (Supplementary data set S6). When we searched 20 kb windows around the significant and suggestive SNPs associated with photosynthetic traits, 382 (66%) and 1043(68%) SNPs were functionally annotated based on the best/top BLAST alignments for each SNP (Supplementary data set S7). Most of the significant or suggestive SNPs located within or nearby the genes which were involved in the regulation of photosynthesis and drought tolerance. More details are described in supplementary file.

Gene-set enrichment and network analysis
We extracted all genes within 20 kb around the significant and suggestive SNPs. Given a P-value of 9.95 × 10 −5 (-log 10 P = 5), we detected 5907 (MArray), 1588 (MGBS) and 1497 (PGBS) SNPs associated with all the photosynthetic traits under all conditions ( Figure 9A). Of these, 61 and SNPs were common between MGBS and PGBS, and 9 SNPs between MArray and MGBS ( Figure 9A). We identified 6254 candidate genes adjacent to the SNPs of Array dataset, while 1621 and 1615 candidate genes were detected near the MGBS and PGBS SNPs, respectively ( Figure 9B). Of these, 874 candidate genes were identified by more than one GWAS dataset (MArray, MGBS, and PGBS). We also found that 67 genes were common between the three datasets ( Figure 9B). On the other hand, 3123, 4098 and 2268 SNPs associated with studied traits under WW, WS and WR conditions were located within or 10 kb upstream or downstream of 3376, 4468 and 2582 genes in the walnut gene annotation v2.0, respectively ( Figure 9C and D). The identified genes were characterized, and various KEGG pathways and GO terms were found to be particularly relevant to drought tolerance and photosynthesis.
We identified 96, 104 and 96 KEGG pathways using identified genes associated with photosynthetic traits under WW, WS and WR conditions, respectively, of which        15, 26 and 20 were significantly enriched ( Figure 10A). Enriched pathways were related to metabolic processes including carbohydrate metabolism, amino acid metabolism, lipid metabolism, energy metabolism and signal transduction. We found that few pathways were shared by genes associated with photosynthetic traits in WW, WS and WR conditions, and several unique pathways were identified for genes associated with photosynthetic traits in each condition ( Figure 10A). For instance, a few KEGG pathways related to carbohydrate metabolism including pentose and glucuronate interconversions (KEGG:00040), C5-Branched dibasic acid metabolism (KEGG:00660), and N-Glycan biosynthesis (KEGG:00510), were shared by genes associated with photosynthetic traits in all conditions. We found several pathways, including Ribosome (KEGG:03010), inositol phosphate metabolism (KEGG:00562), tyrosine metabolism (KEGG:00350), pentose phosphate pathway (KEGG:00030), Cysteine and methionine metabolism (KEGG:00270), starch and sucrose metabolism (KEGG: 00500), and glycolysis/gluconeogenesis (KEGG:00010) showed an overrepresentation of significant genes associated with photosynthetic traits under WS condition. On the other hand, sphingolipid metabolism (KEGG:00600), sulfur relay system (KEGG:04122), Phosphonate and phosphinate metabolism (KEGG:00440), carotenoid biosynthesis (KEGG:00906), and cyanoamino acid metabolism (KEGG:00460) pathways were unique to genes detected for photosynthetic traits under WR condition. We also identified several others pathways related to lipid and amino acid metabolisms, such as fatty acid biosynthesis (KEGG:00061), nitrogen metabolism (KEGG:00910), tryptophan metabolism (KEGG:00380), and biotin metabolism (KEGG:00780), were specific to genes associated with photosynthetic traits under WW condition ( Figure 10A).
As a complementary approach to the KEGG survey, we found 25, 29 and 19 significant GO terms using identified genes, which were associated with photosynthetic traits under WW, WS and WR conditions, respectively ( Figure S13). GO terms of the protein metabolic process (GO:0019538), lipid phosphorylation (GO:0046834), phosphatidylinositol phosphate kinase activity (GO:0016307), and carbohydrate biosynthetic process (GO:0016051) under WS condition, as well as, response to photooxidative stress (GO:0080183), membrane lipid metabolic process (GO:0006643), response to oxidative stress (GO:0006979) and antioxidant activity (GO:0016209) under WR condition were the important enriched terms ( Figure S13). These results indicated different genetic controls of photosynthetic traits under drought and recovery conditions in Persian walnut.
To provide further insight into the interaction in the pathways related to drought-response related genes, we proceeded beyond enriching pathways to identify highly modulated drought and photosynthesis specific sub-networks. The protein-protein interactions (PPI) were identified using the STRING database based on the genes associated with the photosynthetic traits in each condition separately (WW, WS and WR) and the network was subsequently constructed using Cytoscape ( Figure 10B-D). The most important identified hub genes were directly or indirectly involved in the photosynthesis and drought stress responses ( Figure 10B-D).

Discussion
Given the challenges of climate change and consequent water shortage in Persian walnut production areas, understanding the complex physiological and genetic basis of drought tolerance and adaptation in walnut is becoming increasingly important [3]. Natural variation in photosynthetic traits in Persian walnut is a largely unexploited resource that can provide useful information for breeding or engineering photosynthetic efficiency. In the current study, we explored natural variation in photosynthetic parameters in Persian walnut by combining a common garden approach with GWAS and pathway enrichment analysis. We have identified both genomic regions and pathways that suggest important adaptive mechanisms exist within the walnut populations sampled in this study. Here, we discuss our main findings and their implications for walnut breeding.

Natural variation in photosynthetic traits of Persian walnut populations
Photosynthesis is highly susceptible to drought stress and can be studied by measuring gas exchange parameters or analysing chlorophyll fluorescence [29]. In present study, photosynthetic traits varied widely under WW, WS, and WR conditions, suggesting that genetic improvement of these traits is feasible in walnut. Water stress (WS) and subsequent recovery (WR) treatments significantly affected all traits compared to the well-watered control (WW), in agreement with previous reports in walnut [6,8]. Under water stress conditions most families exhibited decreased net photosynthetic rate (P n ), stomatal conductance (g s ), transpiration rate (T r ), and intercellular CO 2 concentration (C i ), whilst both intrinsic and instantaneous water use efficiency (WUE) increased substantially. These findings are consistent with those of Zhang et al. (2006) [30] and Arab et al. (2020) [3], suggesting that stomatal closure via the ABA-dependent pathway resulted in decreasing water loss and increasing WUE. We also observed high correlation between P n , T r , and g s implying regulation of stomatal aperture affected P n under normal and stress conditions. Our results are consistent with the results of Rosati et al. (2006) [31] who reported that reduced photosynthesis is mainly associated with the stomatal closure, which consequently influences leaf biochemical processes. Furthermore, Our OJIP-test results showed that drought stress caused a significant inhibition of both PSII and beyond PSII electron transport activities. Our finding demonstrated a decrease in the maximal quantum yield of PSII (F V /F M ), and decline of 0 and ϕ Eo under water stress, especially in drought sensitive genotypes, ref lects the accumulation of Q A , indicating blockage of electron transfer from Q A to Q B on the PSII acceptor side, which was reported by Kalaji et al. (2016) [32]. These results are consistent with Liu et al. (2019) [8] who showed that drought stress affects photosynthetic electron transport of walnut plants.

SNP-array and GBS are complementary for understanding genetic basis of photosynthetic response to drought and re-watering in Persian walnut
GWAS identified at least one SNP with a significant or suggestive association with the majority of photosynthetic traits either in the WW, WS and WR conditions or plasticity of the trait across treatments.
There was little overlap between MArray, MGBS, and PGBS results ( Figure 9A). The first reason is that there is very little SNP overlap between array and GBS datasets (∼5 K SNPs). A second reason is that there are different numbers of individuals (94, 87, and 136) for each analysis. In line with Elbasyoni et al. (2018) [24], the lack of agreement between the GWAS analyses with array and GBS data is likely due to ascertainment bias inherent in the Array-scored SNPs because markers were discovered in a diverse panel independent from our walnut genotypes, as well as low coverage data in GBS. In agreement with Negro et al. (2019) [25], we identified more significant or suggestive SNPs associated with photosynthetic traits using the SNP-array compared to GBS. This result can be due to the higher genome coverage of the SNP-array. Based on our small dataset, we conclude that combining both datasets for GWAS may be expected to boost the likelihood of identifying trait-SNP associations. Our results showed the infrequent occurrence of genomic loci with a significant association for a trait under different treatments (WW, WS and WR conditions). This could be explained by substantial genetic-environment interaction for the majority of the traits. These highlight distinct mechanisms of photosynthetic response to drought and re-watering in Persian walnut. Interestingly, we identified far fewer significant and suggestive loci overlapping with plasticity in studied traits. This suggests independent genetic control of the expression of a trait and its plasticity [3].

Functional annotation revealed the complex and distinct mechanisms of photosynthetic response to drought and re-watering in Persian walnut
The photosynthetic traits characterized here are all quantitative traits, so multiple genomic regions with small effects are expected. We first searched for genes containing significant and suggestive GWAS SNPs (Table 2 and S6), and identified several potential candidates underlying photosynthetic trait variation. Here, we highlight some of the most important (Tables 3-4).
A peak SNP on chromosome 10 found to be associated with P n under water stress condition was located within a CLPB3 gene encoding a chaperone protein ClpB3. Previous study in Arabidopsis has demonstrated that ClpB/Hsp100 family of proteins is involved in chloroplast development [33]. On chromosome 10, a significant SNP located within a gene encoding an ethylene-responsive transcription factor (ERF), was associated with DSI of RWC. Many studies demonstrated that several ERFs bind to both GCC box and dehydration-responsive elements (DRE) and act as a key regulatory hub in plant responses to biotic and abiotic stresses [34]. On chromosome 16, we found a significant SNP associated with CE, gs, Tr and PC1 under well-water condition fell in a gene (TL17) encoding thylakoid lumenal protein. Recently lumen proteins shown to play important roles in regulating thylakoid biosynthesis and the activity of photosynthetic protein complexes, especially photosystem II [35]. On chromosome 1, a significant SNP which has been linked with F V , F M , F I , and F J under recovery condition, we found to be located inside a gene encoding a receptor-like protein kinase (RLK) which is involved in abiotic stress responses, including calcium signaling and antioxidant defense system [36]. In addition, on chromosome 2, a SNP associated with DI 0 /RC, ABS/RC, ϕ D0 , F V /F M , ϕ Pav , and PC1 under drought stress fell in a gene encoding cyclin-dependent kinase (CDK). Previous studies have shown that CDKs as core cell cycle regulators play key role in diverse aspects of plant responses to abiotic stress [36]. Our results also showed that several other SNP markers associated with the studied traits were located within the genes encoding different protein kinases, especially receptor-like kinases (RLKs), calcium-dependent protein kinases (CDPKs), and mitogen-activated protein kinase (MAPK) cascades. As a result, it can be concluded that protein kinases play crucial roles in Persian walnut responses to the both of drought stress and recovery conditions through various signal transduction pathways.
On chromosome 4, a significant SNP located within a gene BAG4 encoding protein BAG family molecular chaperone regulator 4, was found to be associated with the F M and F I under water-stress. Among the plant Bcl-2 associated athanogene (BAG) genes, BAG4 has been extensively studied and its overexpression in tobacco plants confers tolerance to abiotic stresses [37]. Three peak SNPs on chromosomes 12, 13 and 3 respectively associated with DSI of DI 0 /RC, DSI of TR 0 /RC and F 0 and DSI of P n , T r , g s and PC1 were located in a gene encoding pentatricopeptide repeat-containing protein, thought to play crucial roles in plant responses to abiotic stresses. Jiang et al. (2015) [38] showed that upregulation of the pentatricopeptide repeat SOAR1 expression in Arabidopsis enhances stomatal closure and plant tolerance to multiple abiotic stresses (drought, salinity and cold). On chromosome 11, a SNP associated with F V under drought stress falls within an ACBP4 gene encoding an Acyl-CoAbinding domain-containing protein that may function as an intracellular carrier of acyl-CoA esters. Du et al. (2013) [39] revealed that overexpression of ACBP2 (ACBP2-OXs) in Arabidopsis confers tolerance to drought by promoting ABA signalling and stomatal closure. On chromosome 6, a wall-associated receptor kinase that may function as a signalling receptor of an extracellular matrix component was found to be linked with the DSI of F 0 , DRI of TR 0 /RC and PC2 under well-water condition. Hou et al. (2005) [40] have shown that various cell wall-associated receptor kinase (WAK) gene family members are involved in abiotic stress responses as well they are required for cell elongation and development.
Interestingly, we found two SNPs on chromosomes 8 and 11 located within the FRS5 gene that encodes the protein FAR1-RELATED SEQUENCE, which has been linked with multiple traits, including T r under rewatering condition as well as ϕ E0 , V J , ψ 0 ET 0 /RC, and TR 0 /RC under well-water condition. On the other hand, of particular interest to us are SNPs/or genes correlated with more than one trait. We searched in windows of ±10 kb around the SNPs associated with multitraits (Supplementary data set S7). For example, GWAS identified two SNPs on chromosomes 2 linked to the protein FAR1-RELATED SEQUENCE, which individually associated with 10 and 8 traits (some in multiple environments). Recently, Ma and Li (2018) [41] have reported that this protein plays multiple roles in an extensive range of biological processes, including oxidative stress responses, chlorophyll biosynthesis, and starch synthesis.
Taken altogether, our functional annotation revealed a high number of identified loci that were either close or within the known genes that play crucial roles in photosynthetic processes, including ABA signalling, regulation of stomatal function, chlorophyll biosynthesis, antioxidant biosynthesis, starch synthesis, lipid metabolism, and transduction of environmental signals. Other candidate genes identified in our study encode transcription factors, such as MYB, WRKY, bHLH, AP2like ethylene-responsive protein, and HSP70, which are involved in drought responses in plant. Our results show that drought tolerance and recovery involve distinct and diverse mechanisms. Their polygenic nature represents a constraint on development of new trait combinations and needs to be considered when attempting to breed drought tolerance walnut rootstocks and cultivars.

Gene-set enrichment analysis identified key pathways involved in photosynthetic responses to drought and re-watering in Persian walnut
We complement our BLASTx results with gene-set enrichment analysis using KEGG and GO databases to identify molecular mechanisms underlying photosynthetic responses and drought tolerance in walnut. Pathway analysis revealed significantly enriched pathways that were linked to photosynthesis and drought stress responses. Our KEGG results indicate that photosynthesis pathway under both WW and WS conditions as well as carbon fixation in photosynthetic organisms pathway under WW and WR conditions were enriched with identified genes associated with photosynthetic traits. Previous studies showed that photosynthesis rate declined under WS in walnut [8], however, photosynthesis activity is essential for plant acclimation to WS [42]. Also, KEGG results showed that several amino acid and carbohydrate metabolism-related pathways such as tyrosine metabolism, cysteine and methionine metabolism and starch and sucrose metabolism were enriched under WS. In addition, GO terms related to signaling pathways such as phosphatidylinositol phosphate kinase activity and carbohydrate biosynthetic process were the important enriched terms under WS condition. It is well-documented that drought stress causes the accumulation of reactive oxygen species (ROS) in the cells, which in turns mediates multiple biological processes such as signal transduction pathways activation, oxidoreductase activities, and carbohydrate metabolic processes that are implicated in regulating the response to drought stress [43]. Additionally, high ROS doses have negative effects on cell protection [4].
Likewise, our KEGG results showed that ribosome pathway was significantly enriched by genes affecting photosynthetic efficiency under WS. In agreement with these finding, previous study in Arabidopsis thaliana suggest that ribosome biosynthesis highly increased in response to drought and recovery [44]. Plant have evolved several protective adaptations to respond to drought by up regulation of a considerable number of transcripts, therefore, they need a high number of ribosomes to translate these transcripts to proteins [45]. Also, it was suggested that plants need a significant number of ribosomes during the recovery phase to renew or repair their proteins. Moreover, biological functional analysis revealed that genes associated with photosynthetic traits under WS mapped to carbohydrate metabolism GO terms.
KEGG finding revealed that sphingolipid metabolism, sulfur relay system, carotenoid biosynthesis, and cyanoamino acid metabolism were enriched by identified genes associated with photosynthetic traits under WR condition. Also, our results showed that the membrane lipid metabolic process, response to oxidative stress and antioxidant activity GO terms were enriched with these genes under WR condition. These findings suggest that plants have begun to grow and repair damaged tissues under WR condition. On the other hand, we found a number of enriched pathways shared between the two experimental conditions including; Galactose metabolism and Amino sugar and nucleotide sugar metabolism between WS and WR conditions, and Fructose and mannose metabolism, photosynthesis, and phenylpropanoid biosynthesis between WW and WS conditions, as well as carbon fixation in photosynthetic organisms between WW and WR conditions. Therefore, in line with previous studies [3,4,6] can be concluded that various metabolic processes are involved in Persian walnut adaptation to drought stress. In agreements with our results it has been suggested that amino acid (i.e. proline) and carbohydrates metabolisms play important roles during the drought stress response in Persian walnut [4,6]. Many studies have found that amino acid metabolism is closely related to drought tolerance [46]. Moreover, several studies have shown that carbohydrate metabolism as one of the key plant processes for absorbing the energy generated during photosynthesis occupies a vital function in drought stress responses in addition to acting as energy sources [47,48]. It was reported that that the increasing of sugars and other compatible solutes, contributes to osmotic adjustment under drought stress [4]. Hence, our results suggest that maybe osmotic adjustment is one of the important mechanisms of response to drought in Persian walnut. Overall, these results suggest that multiple biological processes are involved in drought stress responses as well as amino acid and carbohydrate metabolisms may play important roles in Persian walnut seedling responses to drought stress.

Network analysis highlighted hub genes involved in photosynthetic responses to drought and re-watering in Persian walnut
Among many biological pathways activated in plants under environmental stresses, the photosynthesis and cell growth processes are the most sensitive to drought and recovery. Our network analysis further identified the most important hub genes that are directly or indirectly involved in the complex interaction network linked to photosynthetic and drought-related stress responses. Among these, 15 associated with photosynthetic processes (GAPB, PSAN, and CRR1 under WW condition; PGK1 and NTRC under WS condition; DGD1, CYP38, and PETC under recovery condition), carbon and nitrogen metabolism (PPC1 under WR condition), carbohydrate metabolism (SUS6 and GAPC1 under WS condition),regulation of stomatal movement (PLDALPHA1 under WR condition), and cell growth and development (DXS, RPS13A, and HXK1 under WW and WR conditions) were the most important drought-responsive genes. The GAPA and GAPB genes encode one of the two subunits forming respectively the photosynthetic glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and chloroplast localized glyceraldehyde-3-phosphate dehydrogenase, which play crucial roles in plant metabolism and are involved in abiotic stress response [49]. The CRR1 gene is required for both formation and activity of the chloroplast NAD(P)H dehydrogenase (NDH) complex of the photosynthetic electron transport chain [50]. Naranjo et al. (2016) [51] showed NTRC plays an important role in the control of photosynthetic electron transport in Arabidopsis. The RPS1 gene is required for optimal plastid performance and plays an important role in biosynthesis of thylakoid membrane proteins [52]. The PETC gene is a component of the cytochrome b6-f complex, which mediates electron transfer between PSII and PSI, cyclic electron f low around PSI, and state transitions [53]. Overall, these findings (promising candidate genes) will accelerate future efforts aimed at improving walnut drought tolerance.

Conclusion and perspectives
We characterized photosynthetic traits in diverse walnut families (n = 150) grown in a controlled greenhouse under well-watered, water-stressed, and re-watered conditions. GWAS analysis was performed using over 295 K arrayscored and 43 K GBS-scored SNPs. Our main conclusions are the following: (1) Combining two genotyping technologies with different SNP distributions and densities allowed the identification of more marker-trait associations. (2) Identification of different genomic regions under drought stress and drought recovery conditions in walnut suggests that their genetic control is different. (3) Multiple candidate genes previously reported to be associated with photosynthesis and drought tolerance were identified. For the traits obtained from analysis of chlorophyll fluorescence, because of the time and/or labour intensity of data collection, a relatively small number of selected families (60) were evaluated and hence the power of GWAS was reduced. Therefore, a larger walnut population and more SNPs are clearly required to obtain more accurate GWAS results. On the other hand, GWAS of other drought-related traits such as water relations and biochemical parameters in future studies will complement the results of our study. In addition, because of the complex multigenic nature of drought tolerance, the most significant SNPs associated with photosynthetic traits are probably not the true causative loci. (4) The integration of GWAS and enrichment analysis was helpful for identifying promising candidate genes and pathways for further study. Together these findings provide new insight into possible drought tolerance mechanisms in walnut.

Plant material and experimental design:
Plant materials used for this study consisted of a diverse panel of Persian walnut composed of 150 mother trees (local populations; 50-to 500-year-old open pollinated seedlings) from different geographical regions in Iran. This panel is expected to capture most of the genomic variation within locally-adapted populations (Table S1). Each of the sampled populations were located in a distinct habitat with very diverse environmental conditions (e.g. climate, geology, and topography). GPS coordinates and elevation were used to determine climatic parameters of the sampled areas (WorldClim [54]). At least 60 seeds along with leaf samples were collected from each of the 150 mother trees in 2015. A detailed list of mother trees is presented in Table S1.
To evaluation early-lifetime phenotypes of mother trees, and progeny photosynthetic performance under drought condition, collected seeds were established in a common garden. Seeds were stratified to break dormancy and 20 seeds per mother tree (half-sib family) were subsequently planted on 7-liter polyethylene pots (15 cm × 15 cm × 50 cm deep) in a potting mix (2:1:1 (v/v/v), soil:sand:leaf manure). More details are described in the supplementary file and by   [3]. Ten uniform seedlings from each of the 150 families were then selected to initiate a common garden experiment. Two water stress experiments were carried out under greenhouse conditions (25 (±5) • C, 45 (±10) % RH, and photoperiod of ∼16 h) over two years at the Research Greenhouses of the Department of Horticulture, University of Tehran, Pakdasht, Tehran, Iran. In the first experiment, 10 uniform seedlings (6month-old) from each family were randomly assigned to either the well-watered or the water-stressed group. Water stress was applied by withholding and setting three levels of moisture treatment; (i) well-watered (above 75% FC), (ii), mild water-stressed (∼40-50% FC), and (iii) severely water-stressed (∼25-35% FC) (see Figure S1). In the fall when the buds were dormant, 9-month-old sapling were transplanted into 15-liter polyethylene pots (25 cm × 25 cm × 70 cm deep) containing a mix soil as previous. In the second experiment, 15-month-old sapling were arranged in two groups as first experiment. They then were subjected to water stress by withholding and setting three levels of moisture treatment; (i) well-watered (above 75% FC), (ii), severely water-stressed (∼25-35% FC), and (iii) re-watered following severe stress (see Figure S1). Experiments were laid out in a factorial completely randomized design with two factors (family and water treatment) and 2-3 replications.

Drought score index and relative water content (RWC)
Score index was visually graded on a range of 1 to 9 according to the appearance characteristics of the plant (1 to 9 indicates perfectly healthy plants to damaged and dying plants). Fresh leaves samples (ten uniform leaf discs) were collected from each plant, weighed [fresh weight (FW)], and placed in a petri dish filled with distilled deionized water for 24 h. Surface water on the leaves was removed through tissue paper and the leaves were weighed [turgor weight (TW)] and dried at 70 • C. After 24 h, dry weight (DW) of samples was recorded and RWC was calculated as follow: (FW-DW)/(TW-DW) × 100 (Table 1).

Photosynthesis measurements
Gas-exchange photosynthetic parameters were measured three times on the 150 families using an infrared gas analysing system, IRGA (LI-6400, Li-Cor, Inc., Lincoln, NE, USA) under severe water-stress level in the first and second experiments, and at the end of recovery period in the second experiment. Measurements were performed on two fully-expanded, upper-canopy leaflets (fifth leaf in basipetal order) per plant, with controlled atmosphere (∼400 μmol CO 2 mol −1 ; 25± 2 • C and ∼ 50-60% relative humidity) and photosynthetic active radiation of 1200-1500 μmol photons m −2 s −1 . The infra-red gas analyser system (IRGA) was manually adjusted and the levels of CO 2 and H 2 O references were fixed before measurements. For each time-point, measurements were taken on only undamaged leaves during two consecutive days from 9 a.m. till 3 p.m. Net photosynthetic rate (P n in μmol CO 2 m −2 s −1 ), transpiration rate (T r in mmol H 2 O m −2 s −1 ), stomatal conductance (g s in mol H 2 O m −2 s −1 ), and intercellular CO 2 concentration (C i in μmol CO 2 mol −1 air) were recorded from two plants per treatment. P n , T r and g s were used to calculate instantaneous and intrinsic WUE (WUEinst = P n /T r in μmol CO 2 mmol H 2 O −1 ; WUEintri = P n /g s in μmol CO 2 mol H 2 O −1 , respectively). P n /C i ratio (CE) was taken as an estimate of carboxylation efficiency of Rubisco [55].

Chlorophyll fluorescence measurements in the first experiment
In the first experiment, parameters obtained for the 150 families from the analysis of chlorophyll fluorescence were recorded under mild (3 weeks after drought) and severe (5 weeks after drought) stress. For measuring chlorophyll a fluorescence, we used the same leaf immediately after the gas exchange analysis. The measurements were performed in the greenhouse during two consecutive days from 9 AM until 3 PM, using a portable chlorophyll fluorometer (PAM-2500, Walz, Effeltrich, Germany). After a dark-adapted period (30 min) with dark leaf clips, the minimum fluorescence (F 0 ) was measured using weak modulated irradiation light [<0.1 μmol (photons) m −2 s −1 ]. Afterwards, a 800 ms saturating flash at 6000 μmol (photons) m −2 s −1 was applied to determine the maximum chlorophyll fluorescence (F M ), variable fluorescence (F V = F M -F 0 ) and maximum quantum yield of PSII (F V /F M ) using the equation in Table 1.

Chlorophyll fluorescence measurements in the second experiment
In the second experiment, chlorophyll fluorescence parameters of 60 selected extreme families (very drought tolerant to very sensitive) were measured under severe stress (24 days after drought) and recovery (two weeks after re-irrigation). Polyphasic Chl a fluorescence transients (OJIP-test) were measured using a portable fluorometer (Fluorpen FP 100-MAX, Photon Systems Instruments, Drasov, Czech Republic) in the middle part of the sapling in young fully-expanded walnut leaflets with 3 replicates for each treatment (control or drought) after 20 min dark adaptation. To fully ensure that all PSII centers are open, plants were allowed to dark-adapt overnight, and the lights were extinguished in the greenhouse until measurements were concluded pre-dawn (between 1 and 5 a.m.). The fluorescence measurements were taken by a saturating light of ∼3000 μmol m −2 s −1 . Fluorescence intensities were recorded at four time points: 50 μs (O), 2 ms (J), 60 ms (I), and maximum fluorescence at around 1 s (P). Measurements related to the OJIP test were calculated based on the approaches described by Strasser et al. (2000Strasser et al. ( , 2004 [13,29]. The definition of the measured parameters and detailed calculation formulas are listed in Table 1. More details of the OJIP-test are given in the supplementary file.

Calculation of phenotypic plasticity
The response of genotypes to drought stress for all measured traits was expressed as a relative change in waterstress compared with well-water conditions using the drought stress index (DSI) described in Wójcik-Jagła et al. (2013) [56] and calculated as follows: DSI = (value of trait under water-stressed condition) / (value of trait under well-watered condition) × 100. The drought recovery index (DRI) was calculated using the same equation but substituting the value of trait under recovery condition in place of the value of trait under drought condition.

Statistical analysis
Statistical analyses were performed using Minitab software (Minitab, Inc., State College, PA, USA) and R environment (R Development Core Team, 2017) and the related R packages. Descriptive statistics and normality tests were run on both of the data and their residuals, respectively. The results of descriptive statistics were plotted using ggplot2 R package [57]. Analysis of variance of each experiment was performed separately. We applied general linear models (GLM) to test the effect of families (F), water stress or re-watering treatment (T), and their interaction (F × T) on each photosynthetic-related trait under mild and severe drought stress, and subsequently re-watering. Pearson correlation analysis was conducted via factoextra package [58] in R.

DNA extraction, GBS library construction, and sequencing
In this study, 95 out of 150 mother trees were chosen for retrospective tissue sampling and mother-tree genotyping. In other words, genotype data are only available for 95 of the 150 mother trees. Mature fresh leaves were sampled from each family and mother tree (150 + 95 = 245 in total), immediately frozen in liquid nitrogen, and lyophilized prior to DNA extractions. Genomic DNAs of 95 mother trees and 150 families (pooled leaf tissue from 10 six-month-old individuals per each family) were isolated from young leaflets from 40 mg of dry leaves using the E-Z 96 Plant DNA Kit (Omega Bio-tek; Norcross, GA) according to the manufacturer's instructions. The DNA concentrations were determined using Qubit dsDNA High Sensitivity (HS) Assay Kits (InVitrogen, Life Technologies), after adjusting to 50 ng/μl 10 μl aliquots (500 ng in total) were used for the library preparation. GBS was performed on 245 walnut samples using a two-enzyme GBS protocol previously described [59] with a few modifications. Brief ly, individual DNA samples were digested using HindIII-HF (3 U) and MseI (1.5 U) followed by ligation of individual-specific barcode and common adapters using T4 DNA ligase. Ligated samples were pooled together (96-plex libraries) then amplified by polymerase chain reaction (PCR). Purification of final libraries was performed using magnetic beads (Omega Bio-Tek, GA, USA) for fragment selection in the range of 150-500 bp. Library size distribution was checked using a 2100 BioAnalyzer (Agilent Technologies, CA, USA). Three 96plex libraries were pooled to generate a single lane (288-plex) sequencing libraries. Libraries were sequenced through single end 90 bp sequencing read length using Illumina HiSeq4000 of the University of California Davis Genome Center (https://dnatech.genomecenter.ucdavis. edu/illumina-library-sequencing/).

SNPs calling, filtering, and imputation
SNP calling was performed using the TASSEL 5 GBS v2 SNP-calling pipeline [60] and the walnut cv. Chandler v2.0 reference genome [22]. 64 bp tags that occurred at least ten times were mapped using Burrows-Wheeler Aligner (BWA) with default parameters [61]. SNPs with average coverage below 1 (n = 22 328) or low coverage across individuals (n = 7885; data in <20% of individuals) were discarded before SNP calling, as were SNPs with extremely high coverage (n = 14 462; log(average coverage) > 2.75; likely repetitive DNA) and SNPs with an excess of heterozygous genotypes (n = 7067; inbreeding coefficient F < −0.05 in the mother trees only; likely paralogous SNPs). The latter two thresholds were established by comparing the proportion of SNPs matching the Axiom array at different thresholds ( Figure S14). Since some of our samples represent pooled tissue from multiple individuals, we relaxed the last threshold to discard only the few SNPs that showed the largest excess of heterozygous calls. After SNP calling, the resulting vcf file was filtered to exclude taxa and SNPs with >90% missingness and SNPs with minor allele count <20. Missing genotype calls were then imputed using Beagle 5.0 [62] using default parameters. After imputation, we further filtered the datasets to exclude SNPs with minor allele frequency less than 5% in both mother trees and their offspring (MGBS and PGBS) and SNPs with severe departures from the Hardy-Weinberg equilibrium only in mother trees (MGBS) using PLINK v1.9 software [63].
Genotyping with the Axiom J. regia 700 K SNPs array, and quality control The 95 mother trees were genotyped at 609658 SNPs evenly distributed throughout the walnut genome using the high-density walnut array from Affymetrix [16] as described by Arab et al (2019) [17].

Single and multi-trait genome-wide association mapping
GWAS was performed for 30 photosynthetic traits under well-water (WW), water-stress (WS) and recovery (WR) conditions. Specifically, the phenotypic data included: (i) average performance of families for each trait under WW, WS and WR conditions; (ii) DSI and DRI; (iii) PC1-PC5 of phenotypic data; and (iv) PCs of DSI and DRI values. Three SNP panels including the array data on 94 mother trees (n = 295 685 SNPs), GBS data on 87 mother trees (n = 40 828 SNPs) and GBS data on 136 families (n = 43 607 SNPs) were used separately for GWAS. A subset of traits was measured on 60 selected families, in which case GWAS was performed using just 60 individuals. Single trait GWAS was conducted by applying two different models implemented in the R package Genome Association and Prediction Integrated Tools (GAPIT v3.0 [64]): (i) the Fixed and Random Model Circulating Probability Unification (FarmCPU [65]), and (ii) the Multiple Loci Linear Mixed Model (MLMM [66]). Population structure (PCA) and genetic relatedness (kinship matrix) were calculated in GAPIT and included in the GWAS models as a random effect to control spurious associations [67]. Population structure analysis is described in detail in the supplementary file. The best number of PCs to include in the GWAS models was determined based on the Bayesian Information Criterion (BIC), as implemented in the "model selection" function of GAPIT and scree-plot of PCA results. Quantile-quantile (QQ) plots were applied to check if the model was correctly fitted. Bonferroni genome-wide thresholds were set to define significant associations in each dataset. The significant Bonferroni P-value thresholds were 1.7E-7, 1.22E-6 and 1.15E-6 for the MArray, MGBS, and PGBS datasets, respectively. The suggestive P-value thresholds (1/number of SNP markers) were 3.38E-6, 2.44E-5, and 2.29E-5 for the MArray, MGBS, and PGBS datasets, respectively. A lower suggestive Pvalue of 9.99 × 10 −5 (-log 10 P = 5) was also used to detect the suggestive SNP-trait associations for gene set enrichment and network analysis.

Gene annotation and pathway enrichment analysis
Significant and suggestive SNPs identified in the GWAS were used to search for putative candidate genes controlling photosynthetic traits in walnut. Flanking sequences of the significant and suggestive SNPs from both the SNP Array and GBS method were annotated using the National Centre for Biotechnology Information (NCBI; https://www.ncbi.nlm.nih.gov) BLASTX function. A SNP was considered to be located within a gene if it falls within a gene sequence. Then, based on the LD decay in our walnut panel [3], 20-kb windows were drawn around the identified marker-trait associations (10 kb upstream and downstream the SNP position) to search for candidate genes using the annotation v2.0 [22]. Enriched functional annotation clusters of the associated walnut candidate genes were defined using Blast2GO V5.0 tool [68] (E-value ≥1 × 10 −5 ) implemented in Gene Ontology (GO [69]) and Kyoto Encyclopedia of Genes and Genomes (KEGG [70]) databases. The protein-protein interactions (PPI) were identified using the STRING database [71] based on the candidate genes and the network was subsequently constructed using Cytoscape [72]. Furthermore, hub genes were identified in the PPI networks using cytoHubb [73].

Author Contributions
MMA: designed, performed the project, data analysis and wrote the manuscript. PJB, RAA and HA: directed the project, contributed to the data analysis, interpretation of the results and revised the manuscript. SA, AMB and SSS: designed some experiments, analysed the data and revised manuscript. CAL and AM: designed some parts of the project, and revised manuscript. MBM: helped with environmental data collation and respective analysis. DBN and KV: designed, directed the project and revised manuscript. All authors listed, have made direct and intellectual contribution to the work, and approved it for publication.

Data availability
All data supporting the findings of the present study are available within the paper and its supplementary material files.