Genomic Signatures of Adaptation to a Precipitation Gradient in Nigerian Sorghum

Evolution of plants under climatic gradients may lead to clinal adaptation. Understanding the genomic basis of clinal adaptation in crops species could facilitate breeding for climate resilience. We investigated signatures of clinal adaptation in the cereal crop sorghum (Sorghum bicolor L. [Moench]) to the precipitation gradient in West Africa using a panel (n = 607) of sorghum accessions from diverse agroclimatic zones of Nigeria. Significant correlations were observed between common-garden phenotypes of three putative climate-adaptive traits (flowering time, plant height, and panicle length) and climatic variables. The panel was characterized at >400,000 single nucleotide polymorphisms (SNPs) using genotyping-by-sequencing (GBS). Redundancy analysis indicated that a small proportion of SNP variation can be explained by climate (1%), space (1%), and climate collinear with space (3%). Discriminant analysis of principal components identified three genetic groups that are distributed differently along the precipitation gradient. Genome-wide association studies were conducted with phenotypes and three climatic variables (annual mean precipitation, precipitation in the driest quarter, and annual mean temperature). There was no overall enrichment of associations near a priori candidate genes implicated in flowering time, height, and inflorescence architecture in cereals, but several significant associations were found near a priori candidates including photoperiodic flowering regulators SbCN12 and Ma6. Together, the findings suggest that a small (3%) but significant proportion of nucleotide variation in Nigerian sorghum landraces reflects clinal adaptation along the West African precipitation gradient.

tropical to temperate zones has led to clinal adaptation in flowering time (Ducrocq et al. 2008;Buckler et al. 2009;Wu et al. 2013;Kloosterman et al. 2013) and cold tolerance (Comadran et al. 2012;Ma et al. 2015). Local adaptation of traditional varieties has played a key role in smallholder crop production under adverse climatic conditions and low agricultural inputs (Vasconcelos et al. 2013). Locally-adapted crop landraces possess alleles that can be beneficial for the development of improved varieties to ensure food security under stressful climates (Zeven 1998;Soler et al. 2013;Lasky et al. 2015).
Understanding genetic diversity, population structure, and genotype-phenotype associations in crop landraces can guide germplasm conservation and breeding (Djè et al. 2000;Soler et al. 2013;Dwivedi et al. 2016). Recent advances in genotyping technology have facilitated studies of genomic diversity in crops, including studies of local and clinal adaptation using population and quantitative trait genomics (Myles et al. 2009;Morrell et al. 2012;Savolainen et al. 2013). Popula- (Meyer et al. 2016;Li et al. 2017), and sorghum (Morris et al. 2013;Mace et al. 2013). In sorghum, quantitative trait genomics approaches using mixed linear model have identified genomic regions associated with adaptive traits and climatic variables (Morris et al. 2013;Zhang et al. 2015;Lasky et al. 2015).
Sorghum (Sorghum bicolor L. [Moench]) is an essential staple cereal crop in dryland regions of the world (National Research Council 1996). It has adapted to a wide variety of climatic gradients and has abundant phenotypic variation for flowering time, plant morphology, and inflorescence morphology (Morris et al. 2013;Zhang et al. 2015;Lasky et al. 2015). Globally, the morphological types (botanical races) of sorghum are distributed according to precipitation zones, with open-panicle guinea types predominant in humid regions, semi-compact caudatum types predominant in semi-arid regions, and compact-panicle durra types predominant in arid regions (Doggett 1988; Morris et al. 2013). In West Africa, sorghum is found across a steep north-south precipitation gradient, ranging from semiarid grasslands bordering the Sahara Desert in the north (Sahelian zone), through subhumid savannah (Sudanian zone), to humid forest zones in the south (Guinean zone). These regions have been subject to major droughts for several millennia (Shanahan et al. 2009) and increased drought under climate change is expected to reduce sorghum yields in this region (Lobell et al. 2008).
The West African country of Nigeria is Africa's most populous nation and its largest sorghum producer, with 5-10 million Mg of grain production per year (Nzeka and Akhidenor 2018). Sorghum is the major cereal in the northern Sudano-Sahelian region of Nigeria, which is characterized by prolonged dry seasons and short rainy seasons (National Research Council 1996). Sorghum, as a non-centric crop, has multiple centers of diversity and two of these overlap with the boundaries of Nigeria (Harlan 1971(Harlan , 1992. The genetic diversity of Nigerian sorghum is poorly characterized compared to other African sorghum germplasm (Rao et al. 1985;Deu et al. 2008;Barro-Kondombo et al. 2010;Leiser et al. 2014). Identifying genomic regions underlying adaptation in Nigerian sorghum germplasm could facilitate the identification of adaptive traits and genetic diversity relevant to crop improvement.
Given that sorghum is distributed across the precipitation gradient in Nigeria, we hypothesized that Nigerian sorghum germplasm has been shaped by clinal adaptation. Under this hypothesis, we expect precipitation variables to be associated with both phenotype (putative climate-adaptive traits) and genotype (population structure and SNPs). Further, we expect that trait-associated and climate-associated genome regions will colocalize with genes involved in putative climate-adaptive traits. We investigated these predictions in a large panel of georeferenced Nigerian genebank accessions, which were previously phenotyped and which we genotyped at high-density using GBS. We characterized patterns of association among climatic, phenotypic, and genotypic variables and tested colocalization of associated genomic regions. Overall, the patterns are consistent with a small contribution of clinal adaptation shaping genomic variation in Nigerian sorghum.

Plant materials
Seeds for 553 Nigerian accessions were obtained from the USDA National Plant Germplasm System (NPGS) (https://www.ars-grin.gov/). Seedlings were raised in a greenhouse for two weeks and 50 mg of fresh leaf tissue was collected from each accession into 96-well plates. A control well was left empty on each plate. Leaf tissue was lyophilized (Labconco Freeze Dryer, Kansas City, MO, USA) for two days and then ground using 96-well plate plant tissue grinder (Retsch Mixer Mill, Haan, Germany). Genomic DNA was extracted using BioSprint 96 DNA Plant Kit (QIAGEN, Valencia CA, USA), quantified using Quant-iTTM PicoGreen dsDNA Assay Kit (ThermoFisher Scientific, Waltham MA, USA), then normalized to 10 ng/ml.
Genotyping-by-sequencing GBS was conducted on 553 Nigerian accessions using methods previously described (Elshire et al. 2011;Morris et al. 2013). Briefly, individual DNA samples were digested using ApeKI restriction enzyme (NEB R0643L) followed by ligation of barcode and common adapters ligation using T4 DNA ligase (NEB M0202L). Ligated libraries were pooled (96-plex libraries) then amplified by polymerase chain reaction (PCR). Purification of libraries was performed using QIAquick PCR purification kit (QIAGEN, Valencia CA, USA). Library size distribution was obtained using a Bioanalyzer (Agilent Technologies 2100, Santa Clara CA, USA). Four 96-plex libraries were pooled to generate 384-plex sequencing libraries. Libraries were sequenced using single end 100-cycle sequencing using Illumina HiSeq2500 (Illumina, San Diego CA, USA) at the University of Kansas Medical Center, Kansas City MO, USA.
Sequence reads for Nigerian germplasm were combined with published sequence reads obtained for 1943 accessions (Lasky et al. 2015). The published sequence data were composed of globally diverse sorghum landraces with major representation by accessions from Africa and Asia. They were obtained from the United States NPGS-GRIN and the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT) gene banks. From these 1943 georeferenced global accessions, sequence information from 158 Nigerian accessions was obtained and combined with the Nigerian NPGS set. Duplicated accessions and accessions with sorghum conversion (SC) numbers (i.e., with introgressions for early maturity and semi-dwarf genes) in the NPGS database were removed from the Nigerian germplasm. Thus, 607 Nigerian accessions (of which 443 were georeferenced; Figure  1A) and 1785 georeferenced global accessions were used for downstream analysis (Files S1 and S2). Reads were aligned to the sorghum reference genome v3.0 (McCormick et al. 2018) using Burrow Wheeler Alignment algorithm (Li and Durbin 2009). SNP calling was performed using TASSEL 5.0 GBS pipeline (Glaubitz et al. 2014). The SNPs were filtered for , 20% missingness, then missing data were imputed using BEAGLE 4.0 (Browning and Browning 2013).
Climate and phenotype data Climate data (average from 1960 to 1990) were obtained from WorldClim 1.4 using the Raster package in R (Hijmans et al. 2016) based on the coordinate (latitude and longitude) for each of the 443 georeferenced Nigerian accessions (File S1) and 1785 global accessions (File S2). As proxies for precipitation gradients that are hypothesized to affect Nigerian sorghum we investigated "annual precipitation" and "precipitation in the driest quarter". Common garden passport data for flowering time, plant height, and panicle length for the Nigerian accessions were obtained from the USDA-NPGS Germplasm Resource Information Network database (https://www.ars-grin.gov/). The passport data were based on evaluations in one or more common garden experiments in tropical latitudes (Puerto Rico and St. Croix,(17)(18), so best linear unbiased predictors (BLUPs) were estimated for each trait for each accession (File S1). A term for common garden was fit as random in the BLUP estimation model using lmer function in LME4 package in R (Bates et al. 2014) as follows: where y i is the vector of phenotypic observation of the i th accession, g ij is the j th common garden where i th accession was evaluated, and e ijk is the residual or error term. Pearson correlations were calculated between BLUPs of three adaptive traits (flowering time, panicle length, and plant height), and environmental factors (latitude, temperature and precipitation). To reduce the influence of outlier sites with exceptional climatic variables, precipitation values of three geographical locations where sorghum is not commonly cultivated were removed. Analysis of variance (ANOVA) and Tukey HSD test in R were performed to identify precipitation differences among sites of origin for different genetic groups or botanical races.

Redundancy analysis
Redundancy analysis (RDA) was performed separately for global and Nigerian germplasm sets using the varpart function in the R vegan package (Oksanen et al. 2017). A multivariate model was fit using the genomic data (431,698 SNPs for the global accession and 279,689 SNPs for the Nigerian accessions, filtered for monomorphic and singleton markers) as response variable. Ten WorldClim 1.4 climatic variables (annual mean temperature, mean temperature wettest quarter, mean temperature driest quarter, mean temperature warmest quarter, mean temperature coldest quarter, annual precipitation, precipitation wettest quarter, precipitation driest quarter, precipitation in the warmest quarter, and precipitation in the coldest quarter) and geographical variables (latitude and longitude, which we refer to as "space") were fitted as predictor terms. The "space" term is included to account for isolation-by-distance . To test the significance of the proportion of variation explained by climate collinear with space in the Nigerian germplasm, the proportion of variation explained was compared to the distribution from 1000 permuted data sets. In each stage of the permutation, individuals (genotypes) were randomized and RDA regression fitted and repeated 1000 times.
Population structure and linkage disequilibrium analyses Discriminant analysis of principal components (DAPC) was conducted with the find clusters function in Adegenet package in R (Jombart 2008;Jombart et al. 2010). Population differentiation (F ST ) between DAPC groups was estimated using -weir-fst-pop parameter (Weir and Cockerham's F ST ) in VCFtools (Danecek et al. 2011). While nucleotide diversity within DAPC groups was estimated using -window-pi (1kb) in VCFtools. LD decay analysis for each DAPC group was performed by PopLDdecay (BGI-shenzhen 2017). For comparison with Nigerian germplasm, West African accessions were identified from the published global GBS data (Lasky et al. 2015). The published GBS data were composed of global accessions from 55 countries, predominantly representing landraces from sub-Saharan Africa and Asia. In the text, "global" refers to all accessions (including Nigerian and other West African accessions), unless otherwise noted that Nigerian or West African accessions have been removed.
Linkage disequilibrium decay for the genomic data for Nigerian, West African, and global germplasm was estimated by PopLDdecay (BGI-shenzhen 2017), with minor allele frequency parameter set at 0.05 and smoothing by the spline function in R. Principal component analysis (PCA) was performed using SNPrelate package in R (Zheng et al. 2012) with LD pruning threshold parameter set to 0.5 and minor allele frequency parameter set to 0.05. Neighbor-joining analysis was performed using TASSEL 5.0 and visualized in APE (Analyses of Phylogenetics and Evolution) package in R (Paradis et al. 2004). Population differentiation (F ST ) between Nigerian, West African and global germplasm was evaluated using -weir-fst-pop parameter (Weir and Cockerham's F ST ) in VCFtools (Danecek et al. 2011). While nucleotide diversity within each germplasm, their inbreeding coefficients, and observed heterozygosity were estimated using -window-pi (1kb window), -het, and -hardy respectively, in VCFtools.

Genome-wide association studies
Genome-wide association studies (GWAS) were performed using BLUPs of traits (panicle length, n = 330; plant height, n = 332; and flowering time, n = 412). After filtering the Nigerian data for a minimum minor allele frequency (MAF) of 0.03, a total of 149,342 SNPs were used in the GWAS analysis. First, a multi-locus mixed linear model (MLMM) ) with a fixed population term (Q) and a random polygenic term (K) was used to perform GWAS for the phenotypic traits. PCA components (first three PCs) used for Q term were estimated using TASSEL 5.0 (Bradbury et al. 2007) and kinship matrix used for the polygenic term was derived from GAPIT (Lipka et al. 2012). Bonferroni correction of 2.6e-07 (a/number of markers, where a = 0.05) was used to determine the cut-off threshold for the phenotypic associations. A set of a priori candidate genes was compiled from Phytozome including known sorghum genes, and sorghum homologs of rice and maize genes known to be involved in inflorescence morphology, maturity, and plant height (n = 169; File S3).

Genome scans
Three environmental variables (annual precipitation, precipitation in the driest quarter, and annual mean temperature) were used as proxies for the precipitation gradient (n = 443). A GLM, which does not include population structure and kinship terms, was used to perform an association scan for climatic variables to reduce false negatives (Bergelson and Roux 2010;Lasky et al. 2015). The top 1% outliers of the environmental associations were selected for enrichment analysis. Enrichment of a priori genes near association peak were performed using a chi-square test. Windows of 100 kb were used as conservative regions for colocalization between SNPs and a priori genes since LD decayed to background levels at . 100 kb. The genome wide Tajima's D across 100 kb windows was tested using VCF tools in global germplasm (D Global ), West African germplasm (D WestAfrica ), and Nigerian germplasm (D Nigeria ). Enrichment analysis for a priori genes was performed by testing whether the D Nigeria 100 kb windows were significantly enriched for our a priori candidate genes relative to a set of random genes derived from the sorghum genome version 3 gff3 gene file from Phytozome (Goodstein et al. 2011;McCormick et al. 2018) for 1000 whole genome permutations.

Data availability
Raw sequencing data are available from the NCBI Sequence Read Archive under project accession SRP132525 SNP genotype, phenotype, and geographic data are available at Dryad (doi:10.5061/dryad. g0141g7). All data are publicly available. File S1 contains detailed descriptions of Nigerian accessions, their passport data, georeference information, the BLUPs of phenotypes, climatic data, and DAPC groups.
File S2 contains detailed descriptions of global accessions, their georeference information, and climatic data. File S3 contains a priori candidate genes list and literature sources. File S4 contains ANOVA and Tukey test results for race by precipitation analysis. File S5 contains detailed descriptions of a priori candidate genes associated with significant SNPs for MLMM and GLM GWAS results for the phenotypes. File S6 contains detailed descriptions of a priori candidate genes associated with outlier SNPs for GLM of environmental variables. File S7 contains detailed descriptions of a priori candidate genes associated with Nigerian germplasm Tajima's D (D Nigeria ) windows. Supplemental material available at Figshare: https://doi.org/10.25387/g3.6942986.

Trait and environment correlations
The georeferenced sorghum accessions from Nigeria originated across a wide precipitation gradient ( Figure 1A). Annual precipitation at the locations of origin ranges from , 500 mm/year for the northern Nigerian accessions (Sahelian zone) to . 3500 mm/year for the southern Nigerian accessions (Guinean zone). Analysis of variance indicates a significant difference in annual precipitation at sites of origin among the botanical races (P-value , 0.001). Significant differences were found between guinea and caudatum types (P-value , 0.05) and between guinea and durra-caudatum types (P-value , 0.01) (File S4) in precipitation differences among sites of origin.
The correlation of annual precipitation with common garden phenotypes of georeferenced Nigerian germplasm was investigated for three traits (panicle length, plant height, and flowering time) ( Figure 1B-D). For comparison, we also considered correlations with annual mean temperature and latitude. Significant positive relationships with annual mean precipitation were observed for panicle length (Figure 2A; r = 0.21, P-value , 0.001) and plant height ( Figure 2B; r = 0.22, Pvalue , 0.001) but not flowering time ( Figure 2C). By contrast, annual mean temperature had no correlation with panicle length ( Figure 2D) or plant height ( Figure 2F) but a significant negative relationship with flowering time ( Figure 2F; r = -0.19; P-value , 0.001). Latitude had negative relationships with panicle length ( Figure 2G; r = -0.19, P-value , 0.001) and plant height ( Figure 2H; r = -0.26, P-value , 0.001), but no relationship with flowering time ( Figure 2I). Among the traits, flowering time had significant positive relationships with both panicle length and plant height ( Fig. S1; r = 0.32 and 0.41, respectively; P-values , 0.001). Among the environmental variables, annual mean precipitation had a significant negative relationship with latitude ( Fig.  S1; r = -0.86, P-value , 0.001) and a weaker but significant negative correlation with annual mean temperature ( Fig. S1; r = -0.23, P-value , 0.001).

Genome-wide nucleotide variation
To investigate genomic variation we developed a data set consisting of 431,698 SNPs genotyped across 2392 accessions (Nigeria, West Africa, and global). Most of the SNP variation in the three panels was rare; about 51% of the Nigerian genomic data were composed of SNPs with minor allele frequencies (MAF) , 0.01, 46% of the West African SNPs have MAF , 0.01, 36% of the global reference SNPs have MAF , 0.01, and 37% of global SNPs have MAF , 0.01 (Fig. S2A). The mean observed heterozygosity across loci in each of the germplasm is 0.02 (2%) (Fig.  S2B). SNP density was higher in sub-telomeric regions and lower in sub-centromeric regions (Fig. S3). In the Nigerian germplasm (n = 607 accessions; Figure 1A), 279,689 SNPs were retained after removing monomorphic markers, singletons, and doubletons from the initial 431,698 SNPs. This corresponds to an average of 1 SNP per 2.7 kb.
The nucleotide diversity of Nigerian germplasm was similar to the West African and global germplasm. The average nucleotide diversity for global germplasm (p global ) and West African germplasm (p WestAfrica ) was 4.5 · 10 24 . The nucleotide diversity in Nigerian germplasm (p Nigeria ) was somewhat lower at 4.0 · 10 24 . The average inbreeding coefficients were 0.83, 0.82, and 0.80 for global germplasm, West African germplasm, and Nigerian germplasm, respectively (all significantly different from each other at P-value , 0.01) (Fig. S2C). The F ST between Nigerian and West African germplasm was 0.007 (Fig. S4A) and F ST between Nigeria and global germplasm was 0.07 (Fig. S4B). The genome-wide average rate of linkage disequilibrium decay differed among the panels. Of the three germplasm sets, the global germplasm had the fastest LD decay rate by reducing to r 2 = 0.1 at 20 kb (Fig. S5). LD decayed to half its initial value at 12 kb and r 2 = 0.1 at 50 kb in the Nigerian germplasm. The West African germplasm had a slowest LD decay rate among the three sets, with r 2 = 0.1 at 100 kb.

Redundancy analysis
To estimate the proportion of SNP variation that has been shaped by climate vs. geographic distance (space) we carried out a redundancy analysis, a form of multivariate regression. The proportion of SNP variation explained by climate and space in global germplasm was substantially greater than in Nigerian germplasm. In global germplasm, the proportion of variation explained by the 10 climate variables, space (latitude and longitude), and their combination together are 4%, 8%, and 5%, respectively ( Figure 3A). By contrast, in Nigerian germplasm, climate and geographic variables explained a smaller proportion of the total SNP variation ( Figure 3B); climate and space alone each explained 1% of the SNP variation, while climate collinear with space explained 3% of the SNP variation. The proportion of variation explained by climate collinear with space was significantly greater (P , 0.001) than the null distribution from geographically permuted data ( Figure 3C).

Population structure analysis of the Nigerian germplasm
To characterize the genetic structure of Nigerian germplasm in relation to global sorghum diversity, we conducted PCA and DAPC. In the PCA, the first PC explained about 6% of the variation while the second PC explained about 4% of the variation. The Nigerian accessions formed mostly separate clusters relative to the global accessions. The West African and Nigerian germplasm clustered together in most cases (Fig.  S6A). Neighbor joining analysis also showed that Nigerian accessions and West African accessions cluster together, separately from the rest of the global germplasm (Fig. S7A). Clustering by botanical race was also observed in the Nigerian germplasm ( Fig. S6B and Fig. S7B).
DAPC analysis identified three genetic groups ( Figure 4A-C). The DAPC groups were genetically differentiated from each other as follows: Group 1 vs. Group 2 (F ST of 0.21), Group 1 vs. Group 3 (F ST of 0.18), and Group 2 vs. Group 3 (F ST of 0.22). Accessions in Group 2 originate from locations with higher precipitation than the accessions in Group 1 (P-value , 0.001) and Group 3 (P-value , 0.001) ( Figure  4D). The average nucleotide diversity in 1kb windows for groups 1, 2, and 3 are 4.4 · 10 24 , 3.3 · 10 24 , and 3.8 · 10 24 respectively. Linkage disequilibrium (r 2 ) level decayed to 0.1 at 30 kb in in Group 1, 80 kb in Group 2, and 90 kb in Group 3 (Fig. S8). Caudatum types are more predominant in Group 1, Guinea types are more predominant in Group 2, and Durra types are more predominant in Group 3 (Table 1).
In order to characterize variation among genetic groups identified by DAPC, ANOVA, and Tukey test were performed for putative adaptive traits and environmental variables. Significant difference in the distribution of precipitation and temperature gradient were found between DAPC Groups 1 and 3 (P-value , 0.01) and Groups 2 and 3 (P-value , 0.01). Also, there were significant differences between Groups 2 and 1 (P-value , 0.001) and Groups 2 and 3 (P-value , 0.001). We also found significant differences for putative adaptive traits between DAPC groups. For panicle length, all DAPC groups comparisons were different at P-value , 0.01. Significant differences were found for flowering time distribution between Groups 1 and 2 (P-value , 0.001). For plant height, significant differences were found between Groups 1 and 2 (P-value , 0.001) and Groups 2 and 3 (P-value , 0.001).

Genome-wide association studies of phenotypes
To identify genomic regions associated with phenotypic variation, we conducted MLMM GWAS for panicle length, plant height, and flowering time using BLUPs of common-garden phenotypes. Several genomic regions associated with the traits were identified and two a priori candidate genes fell within 100 kb ( Figure 5, File S5). For panicle length no associations were significant at the Bonferroni threshold ( Figure 5A). For plant height, a single significant association was observed on chromosome 3 ( Figure 5B). The single significant association for plant height (S3_62675143, MAF = 0.29) colocalized with photoperiodic flowering gene SbCN12 (Sobic.003G295300, 73 kb away) ). For flowering time, nine significant associations were found on chromosomes 3, 6, 7, 9, and 10 ( Figure 5C). The most significant flowering time association (S6_799609, MAF = 0.09) colocalized with the known sorghum flowering time and photoperiod sensitivity gene Maturity6 (Ma6/Ghd7, Sobic.006G004400, 99 kb from gene) . With a naive model (GLM GWAS), nominally significant associations were found on all chromosomes for panicle length, plant height, and flowering time, and a large number of these colocalized with a priori candidate genes (Fig. S9).

Genome scans for adaptation
Associations with environmental variables were used to investigate possible genomic signature of climate adaptation. GLM outliers (top 1% of associations) were identified on all chromosomes for "annual precipitation", "precipitation in the driest quarter," and "annual temperature" (Figure  6 A-D and File S6). Genome-wide, 15% and 6% of a priori candidate genes were localized within 100 kb of a 1% outlier SNP for "annual precipitation" and "precipitation in the driest quarter", respectively (vs. 17% and 16% of all genes). Therefore, there is no enrichment of a priori candidate genes near environmental association outliers. A few of the a priori candidate genes that localize near 1% outlier SNPs are as follows. The SNP S9_54870238 (MAF = 0.36; 99 th percentile) associated with annual precipitation was 90 kb away from the regulator of photoperiodic flowering SbCN8 (Centroradialis8, Sobic.009G199800). S9_8022437 (MAF = 0.49; 99 th percentile) associated with annual precipitation is 94 kb from the sorghum ortholog (Sobic.009G069700) of maize barren inflorescence4 (bif4) (Galli et al. 2015). S3_4891237 and S3_4750963 (MAF = 0.31, 98 th percentile and MAF = 0.04, 99 th percentile) associated with annual precipitation and precipitation in the driest quarter were 100 kb from Sbra2 (Sobic.003G052900), the sorghum ortholog of maize inflorescence gene ramosa2 (ra2) (Brown et al. 2006).
Genome-wide pattern of Tajima's D in the Nigerian germplasm (D Nigeria ) across 1 kb windows ranged between -2.0 to 4.0 ( Fig. S10 and  Fig. S11). The average Tajima's D value in the Nigerian germplasm (D Nigeria ) was -0.2 while the average genome wide Tajima's D (across 1 kb windows) in the global germplasm (D Global ) and West African germplasm (D WestAfrica ) were 0.1 and 0.2, respectively. Positive D Nigeria windows were significantly enriched for a priori candidate genes compared to the expectation under a null distribution (Fig. S12 and File S7). Some of the D Nigeria windows (Fig. S10) contain genes that control for flowering time and inflorescence development (File S7). For instance, the sorghum flowering time gene Maturity6 (Ma6, Sobic.006G004400) colocalized with the genome wide D Nigeria scan window at 0.697 Mb (Tajima's D = 1.48, 89 th percentile) on chromosome 6. Maturity1 (Ma1, Sobic.006G057866) colocalized with the genome wide D Nigeria scan window (Tajima's D = 2.9; in the 98 th percentile) around 40 Mb on chromosome 6. The sorghum ortholog of branched silkless1 (Sobic.002G411000), a maize spikelet meristem identity gene (Chuck et al. 2002), colocalized with the genome wide D Nigeria scan window (Tajima's D = 2.2 in the 94 th percentile) around 75.9 Mb on chromosome 2.

DISCUSSION
Evidence for clinal adaptation to the precipitation gradient Genome-wide studies of nucleotide variation can provide insights into patterns of genetic variation in crop landraces and the role of clinal adaptation in shaping this variation (Meyer et al. 2016). Overall, we found several lines of evidence that sorghum phenotypic variation across Nigeria has been shaped by clinal adaptation to precipitation. For panicle length, common garden variation in the Nigerian sorghum germplasm was correlated with annual precipitation (Figure 2, Fig. S1). Sorghum accessions originating from lower latitudes that have high precipitation had longer panicles than accessions originating from higher latitudes that have less precipitation (Figure 2A, 2G). The long panicle morphology is associated with open and lax primary branches, which is a key feature of guinea race which are predominant in humid to sub-humid regions of Nigeria ( Figure 1A-B) and West Africa more generally (Deu et al. 2008;Barro-Kondombo et al. 2010;Lasky et al. 2015). This open panicle morphology is thought to allow airflow, reducing mold infection under high humidity (Doggett 1988), though this model has not been formally tested in diverse germplasm.
For plant height, sorghum accessions from lower latitudes associated with high precipitation were taller than sorghum accessions from higher latitudes associated with less precipitation (Figure 2B, 2H). This pattern is consistent with cross-species ecological studies of plant height, which identified precipitation as the best environmental predictor of withinspecies latitudinal variation of height (Moles et al. 2009). Given that higher latitudes in Nigeria have lower rainfall, reduced plant height and panicle length in dry regions may be an adaptation to increase yield stability under reduced water availability, as has been observed in West African pearl millet (Vigouroux et al. 2011). In our study, common garden variation in flowering time is associated with temperature but not precipitation at the location of origin. The negative relationship between flowering time and annual mean temperature ( Figure 2F) suggests that sorghum in hot climates may flower early as an escape from high temperature and resulting water limitation (Tuinstra et al. 1997).
The significant proportion of nucleotide variation explained by climate collinear with space (3%; Figure 3) is consistent with clinal adaptation of sorghum in Nigeria. Redundancy analysis indicated that climate collinear with space explained more SNP variation in the Nigerian germplasm than either of climate and space (isolation-by-distance) alone. The finding that climate collinear with space explained more SNP variation than either climate or space alone is consistent with findings in global sorghum germplasm (Lasky et al. 2015) and regional germplasm in wild soybean (Glycine soja) (Leamy et al. 2016) and barley (Hordeum vulgare) landraces (Abebe et al. 2015). However, the proportion of SNP variation explained by climate collinear with space we observed in this study was much lower than what was observed in wild soybean and barley landraces (6-34% and 29-61%, respectively). Methodological differences that may contribute to the lower proportion of SNP variation explained in this study are the greater number of environmental variables used in the soybean and barley studies, and the use of ascertained SNPs in the soybean study.
Genome-wide nucleotide variation in Nigeria is also structured according to precipitation zones. The DAPC analysis identified genetic groups within the sorghum botanical races. These genetic groups showed differences in precipitation distributions ( Figure 4C-D, Table 1). Group 1 was associated with the lowest annual mean precipitation, and composed predominantly of caudatum, caudatum intermediates and bicolor types and prevalent at higher latitudes in northeastern Nigeria characterized with lower annual precipitation. The northeastern part of Nigeria was classified as part of the center of diversity of caudatum and caudatum-durra (Harlan 1992). Group 2 was associated with higher annual precipitation distribution and more prevalent at lower latitudes. Most of the accessions in this group belong to the guinea and guinea intermediate racial types. Group 3 was predominantly made up of durra and caudatum-durra intermediates. Notably, there was a complete absence of the guinea race from this group. Consistent with the model that botanical races in sorghum are differentially adapted to precipitationbased agroclimatic zones (Harlan 1992;Morris et al. 2013), we found differences in botanical race distribution in precipitation zones (File S7). Precipitation distribution of guinea accessions was significantly different from precipitation distribution of caudatum (P-value , 0.05) and caudatum-durra races (P-value , 0.01).
Despite some evidence of clinal adaptation, we should note that the redundancy analysis ( Figure 3) and theoretical considerations (Pavlidis et al. 2012;Meirmans 2015) suggest that bulk of the nucleotide Figure 5 Genome wide association studies of phenotypes for putative adaptation traits. Genome-wide SNP associations for multi-locus mixed model with a fixed population term and a random polygenic term for (A) panicle length, (B) plant height, and (C) flowering time in Nigerian germplasm (n = 329, 331, and 411 respectively). Vertical dashed lines represent position of a priori candidate genes (Ma6 and SbCN12) and found within 150 kb region of the significant association. Horizontal broken red lines signify the Bonferroni correction threshold (P-value , 0.05).
n variation observed in the Nigerian germplasm is neutral. In addition, the small proportion of phenotypic and genomic variation explained by the precipitation gradient ( Figure 3B-C; Figure 2) suggest a modest role of clinal adaptation in shaping phenotypic diversity at the geographic scale of modern Nigeria. However, more detailed studies of panicle and vegetative morphology may reveal traits with stronger climate associations. Other factors that may have shaped the observed diversity patterns which we did not investigate include seed sharing based on ethnolinguistic grouping (Soler et al. 2013;Labeyrie et al. 2014) and historical processes of domestication and diffusion (Kimber et al. 2013;Morris et al. 2013). Consistent with a major role of cultural and historical factors shaping diversity at a Nigeria-wide scale, in the cases where multiple accessions were collected from single locations, multiple botanical races and genetic group were observed ( Figure 1A, File S1).

Identifying putative loci underlying adaptation
The combination of phenotypic association, environmental association, and selection scans can provide multiple lines of evidence for the involvement of particular loci in adaptation (Meyer et al. 2016). In the Nigerian sorghum germplasm, we observed some cases where a priori candidate genes related to adaptive traits colocalized with GWAS signals (Figure 5-6; File S5-S6). The colocalization of two sorghum photoperiod sensitivity genes (SbCN12 and Ma6) with plant height and flowering time associations suggests that photoperiod sensitivity contributes to plant height and flowering time adaptation in Nigerian sorghum germplasm. This is consistent with previous studies that identified SbCN12 and Ma6/Ghd7 as major genes underlying natural variation in photoperiodic flowering in sorghum Yang et al. 2014;Bouchet et al. 2017). In sub-tropical latitudes, like the common gardens used by the USDA genebank ($17°N), photoperiod sensitive sorghums are expected to have longer vegetative growth and attain greater heights than in tropical latitudes due to longer days at the higher latitudes . Given that strong association peaks for both environmental variables (annual precipitation and precipitation in the driest quarter; Figure 6) were found near SbCN8, this gene may be a candidate for clinal adaptation to the precipitation gradient in West Africa. The colocalization of inflorescence genes (ra2 and bif4) with associations for annual precipitation ( Figure 6A; File S5-S6) is consistent with an adaptive role of inflorescence morphology across the precipitation gradient (Harlan 1992;Morris et al. 2013). Accounting for population structure in GWAS models when mapping phenotypes that are correlated with population structure can lead to false negatives (Bergelson and Roux 2010;Lasky et al. 2015). Notably there were no significant associations for panicle length after accounting for population structure (MLM; Figure 5A). This finding is consistent with the expectation that climate-associated traits will be confounded with population structure and there will be little power to map the genetic basis of these traits using mixed model association (Brachi et al. 2011). Genetic dissection of panicle morphology and other putative clinal adaptation traits should be more effective with a multi-parent mapping strategy that breaks up confounding population structure, such as nested association mapping (Buckler et al. 2009;Bouchet et al. 2017).
Long-term stable clinal adaptation is expected to be reflected in genomic signals of balancing selection (Novembre and Rienzo 2009;Yoder et al. 2014). Consistent with the phenotypic evidence of clinal trait adaptation (Figure 2A, 2B, 2F), there was evidence of balancing selection from genome-wide enrichment of D Nigeria windows at a priori candidate genes for flowering time, plant height, and inflorescence morphology ( Fig. S11 and Fig. S12). Further functional studies of candidate genes will be needed to establish if these candidate genes have a role in climate adaptation (Kesari et al. 2012;Romero Navarro et al. 2017). Resequencing of georeferenced germplasm should facilitate the identification of putative functional variants underlying clinal adaptation (Gore et al. 2009;Meyer et al. 2016). Demographic effects due to historical processes of domestication and differentiation affect the pattern of diversity under neutrality (Molina et al. 2011;Meyer and Purugganan 2013). While it is suspected that sorghum has a complex domestication history based on morphology (Harlan 1992), population structure (Deu et al. 2006), and shattering gene haplotypes (Shattering1) Figure 6 Genome wide association studies of climatic variables. Genomewide SNP associations for general linear model of (A) annual mean precipitation, (B) precipitation in the driest quarter, (C) and annual mean temperature. Horizontal broken lines indicate the cutoff point of top 1% outliers. Vertical broken lines indicate the position of a priori candidate genes related to inflorescence morphology (orange), plant height (blue), and flowering time (purple), which are within 100 kb from associated SNPs. (Olsen 2012), the demographic scenarios have not been formally described or evaluated. Therefore, further study on demographic history in sorghum will be valuable to identify which genomic outlier loci are most likely to represent signatures of selection.
If variation at genes involved in putative adaptive traits (flowering time, plant height, and panicle length) underlie clinal variation, then we expect significant enrichment of the a priori genes with environment associations (Rellstab et al. 2015). However, there was no enrichment of a priori candidate genes near associations with "precipitation in the driest quarter" or "annual mean precipitation". An alternative to the single-trait GWAS and colocalization approach used in this study would be a multi-trait GWAS approach Zhou and Stephens 2014) simultaneously considering traits and environmental variables. However, biological interpretation of the synthetic or "eigen" traits may remain a challenge until more corroborating functional genetic data are available (Banerjee et al. 2008).

Resources for genomic-enabled breeding of clinal adaptation
The Nigerian germplasm harbors abundant nucleotide polymorphism (90% of the global nucleotide polymorphism based on the p Nigeria vs. p global ), consistent with West Africa as a center of diversity for sorghum (Harlan 1992). The high diversity could be a result of ancestral diversity in the Nigerian sorghum, and/or gene flow and diffusion from other regions of Africa (Westengen et al. 2014). Population structure and neighbor-joining tree analysis showed that majority of the Nigerian accessions and West African accessions clustered together ( Fig. S6 and Fig. S7). The Nigerian germplasm was 10 times less differentiated from West African germplasm compared to the rest of global sorghum germplasm (F ST Nigeria-West Africa = 0.007 and F ST Nigeria-Global = 0.07). Low level of differentiation among regional germplasm in sorghum has been attributed to gene flow from human migration and agricultural trade (Menkir et al. 1997).
Characterization of LD patterns is critical for interpretation of genome scans since the local extent of LD decay determines resolution of mapping and long-range LD creates spurious associations (Myles et al. 2009). LD decay rate in Nigerian germplasm (half of maximum value at 12 kb) was slower than LD decay estimated in the global sorghum association panel (half of maximum value at 1 kb) (Morris et al. 2013). The global sorghum association panel may capture more historical recombination than the Nigerian germplasm because of its greater geographic diversity. The reduced long-range LD decay in the Nigerian germplasm compared to the West African germplasm (Fig.  S5) may be due to the smaller geographic scale, which should reduce long range LD due to isolation-by-distance (Brachi et al. 2011). Given the observed LD decay rates, the mapping resolution for genome-wide scans in the Nigerian germplasm is expected to be less than in global sorghum panels but greater than in West African regional panels. Overall, the modest LD decay rate and high genetic diversity in the Nigerian germplasm make it suitable for genomewide association studies.
The application of genomics for crop improvement and plant genetic resource management is still in the early stage for most national agricultural research systems in sub-Saharan Africa, including Nigeria and neighboring countries (Ezeaku and Gupta 2004;Leiser et al. 2014;Yohannes et al. 2015). The genomic resources developed in this study represent a step toward genomics-enabled breeding and germplasm management for sorghum landraces in Nigeria. The resources developed include a genome-wide catalog of SNP variation, a description of geographic population structure, and estimates of genetic properties including nucleotide diversity and LD decay. The genomic signatures of clinal adaptation identified in this study, if validated in managed stress and multi-environment mapping studies (Cooper et al. 2014;Lasky et al. 2015), could be another resource to facilitate genomicsenabled breeding for climate-resilience in West Africa.