Genome-Wide Association Study for Maize Leaf Cuticular Conductance Identifies Candidate Genes Involved in the Regulation of Cuticle Development

The cuticle, a hydrophobic layer of cutin and waxes synthesized by plant epidermal cells, is the major barrier to water loss when stomata are closed at night and under water-limited conditions. Elucidating the genetic architecture of natural variation for leaf cuticular conductance (gc) is important for identifying genes relevant to improving crop productivity in drought-prone environments. To this end, we conducted a genome-wide association study of gc of adult leaves in a maize inbred association panel that was evaluated in four environments (Maricopa, AZ, and San Diego, CA, in 2016 and 2017). Five genomic regions significantly associated with gc were resolved to seven plausible candidate genes (ISTL1, two SEC14 homologs, cyclase-associated protein, a CER7 homolog, GDSL lipase, and β-D-XYLOSIDASE 4). These candidates are potentially involved in cuticle biosynthesis, trafficking and deposition of cuticle lipids, cutin polymerization, and cell wall modification. Laser microdissection RNA sequencing revealed that all these candidate genes, with the exception of the CER7 homolog, were expressed in the zone of the expanding adult maize leaf where cuticle maturation occurs. With direct application to genetic improvement, moderately high average predictive abilities were observed for whole-genome prediction of gc in locations (0.46 and 0.45) and across all environments (0.52). The findings of this study provide novel insights into the genetic control of gc and have the potential to help breeders more effectively develop drought-tolerant maize for target environments.

. Cuticle composition is also modulated by environmental factors. Light, temperature, and osmotic stress influence both the quantity and composition of cuticular waxes (Shepherd and Wynne Griffiths 2006;Kosma and Jenks 2007). While high relative humidity usually suppresses wax production (Sutter and Langhans 1982;Koch et al. 2006), drought stress has been found to change cuticular wax composition (Panikashvili et al. 2007;Kosma et al. 2009) and increase wax deposition in several crop species (Shepherd and Wynne Griffiths 2006;Cameron et al. 2006;Kosma and Jenks 2007).
In addition to modulation of its composition by drought stress, a variety of other observations support a role for the cuticle in drought tolerance. Most mutations and other genetic modifications affecting cuticle composition or wax load increase its permeability to water, and this has often been associated with decreased drought tolerance (Zhou et al. 2013;Zhu and Xiong 2013;Li et al. 2019). In a few cases, overexpression of cuticle lipid biosynthetic enzymes, or their transcriptional regulators, increased cuticular lipid abundance and drought tolerance (Aharoni et al. 2004;Zhang et al. 2005;Bourdenx et al. 2011;Wang et al. 2012). Glaucousness (a visible trait resulting from abundant accumulation of epicuticular wax crystals on shoot tissue surfaces) was selected for during the domestication of wheat and involves genes regulating wax biosynthesis (Hen-Avivi et al. 2016;Bi et al. 2016); analyses of heritable variation in glaucousness in wheat and barley has revealed positive correlations between this trait and drought tolerance (Febrero et al. 1998;Guo et al. 2016). These findings point to the potential relevance of cuticle modification for increasing drought tolerance in cereal crops.
There is a longstanding interest in breeding for cuticle-related traits such as drought tolerance, but the complexity of the relationship between cuticle composition and resistance to dehydration, and the lack of methods amenable to high-throughput phenotyping for cuticle-associated traits, has hampered progress in this area (Petit et al. 2017). Genome-wide association studies (GWAS) (Yu et al. 2006;Zhang et al. 2010;Lipka et al. 2015), taking advantage of natural variation in cuticle permeability and historical recombination events within a single species, offer an attractive approach to identifying genes and alleles that can decrease cuticle permeability. Additionally, findings from GWAS of cuticle-related phenotypes could be used to better inform the application of genomic selection strategies (Meuwissen et al. 2001;Lorenz et al. 2011) for increasing drought tolerance in crop species.
In this study, we utilized GWAS combined with laser microdissection RNA sequencing (LM-RNAseq) of epidermal tissue samples, from zones of the expanding maize leaf where the cuticle develops, to implicate candidate genes controlling the water barrier function of the maize leaf cuticle. Methods directly measuring permeability of the cuticle to water (Valeska Zeisler-Diehl et al. 2017) are not amenable to the high-throughput phenotyping needed for GWAS involving analysis of thousands of samples. Thus, we utilized a phenotyping strategy that indirectly measures cuticle water barrier function by calculating the drying rates of detached leaves placed in the dark to close stomata (Ristic and Jenks 2002). While some water may be lost via stomata that are not completely sealed, the sealing of stomata is thought to depend on cuticular flaps lining the stomatal pore that overlap when stomata are closed (Zhao and Sack 1999;Kosma and Jenks 2007). Thus, we used this phenotyping strategy with the expectation that it primarily measures cuticle-dependent water loss, which we refer to as adult leaf cuticular conductance (g c ). Maize is most sensitive to drought stress at flowering (Grant et al. 1989), when juvenile leaves have already senesced and only adult leaves remain. Thus, we utilized adult leaves of field-grown plants for this analysis. Moreover, since cuticle composition is known to vary depending on the growth environment (Shepherd and Wynne Griffiths 2006), we utilized plants grown in two contrasting environments: the cooler and more humid environment of San Diego, CA, and the hotter and more arid environment of Maricopa, AZ. We aimed to (i) detect genomic regions associated with natural variation in g c ; (ii) enhance selection of candidate genes impacting this trait through an LM-RNAseq analysis of the expanding maize adult leaf; and (iii) evaluate whole-genome prediction models to facilitate genomic selection on this trait, providing tools for potential improvement of drought-tolerance in maize for target environments.

Plant materials and experimental design
A set of 468 maize inbred lines from the Wisconsin Diversity panel (Hansey et al. 2011) was evaluated for adult leaf cuticular conductance (g c ). The inbred lines were planted at the Maricopa Agricultural Center, Maricopa, AZ, and University of California San Diego, San Diego, CA, in 2016 and 2017. The layout of the experiment in each of the four environments (location · year combination) was arranged as an augmented incomplete block design. Within each of the 26 incomplete blocks, the incomplete block of 18 experimental lines was augmented by the addition of two checks (B73 and Mo17, 2016;N28Ht and Mo17, 2017) in random positions, for a total of 20 entries. The entire experiment of 468 unique inbred lines and 52 repeated checks was grown as a single replicate in each environment. Edge effects were reduced by planting a locally adapted maize inbred line around the perimeter of the experiment. Experimental units were one-row plots of 3.05 m (Maricopa) and 4.88 m (San Diego) in length with 1.016 m inter-row spacing. There was a 0.91 m alley at the end of each plot. In each plot, 12 kernels were planted, followed by thinning the stand as needed to limit plant-to-plant competition. We obtained and summarized meteorological data (Table S1) from automated weather stations located at a distance of $200 m or less from the experimental field sites in Maricopa, AZ (Arizona Meteorological Network; http://ag.arizona.edu/ azmet/index.html) and San Diego, CA (Scripps Hydroclimate Weather Station).
harvested for g c analysis, within a week of their anthesis dates), and transferred to a dark room where they were kept with cut leaf bases immersed in water throughout the analysis. Diffusion conductance was measured at a series of timepoints for a selection of lines that had previously been demonstrated to have high g c , and B73 as a reference standard harvested each sampling day. As shown in Figure S1, most lines reached minimum conductance values, indicating stomatal closure, within 30 min, and all lines reached minimum values within 90 min. Based on these findings, we concluded that 2 h in the dark was sufficient to achieve stomatal closure, even for lines with high g c .
The findings from the porometer study were used to inform our phenotyping of the Wisconsin Diversity panel. To inform sampling times and help control for maturity differences, flowering time (days to anthesis, DTA) was recorded as the number of days from planting to initiation of pollen shed for 50% of the plants in a plot. For each plot, g c was evaluated on an adult leaf from five plants when 50% of the plants in the plot were at anthesis. From each of five plants, the leaf subtending the uppermost ear, or the leaf immediately above or below it, was excised at 2.5 cm below the ligule. If there were fewer than five plants to measure in a plot, two, three, or four plants were evaluated to represent the g c of that plot. Cut ends were immersed in water and leaves were incubated in a dark, well-ventilated room for 2 h at 20-22°and 55-65% RH to close stomata while ensuring that the leaves were fully hydrated. Subsequently, each leaf was hung on a line from a boot clip in the same dark, temperature-and humidity-controlled room, after carefully wiping the leaves to remove excess water. In 2017, the cut leaf end was wrapped in parafilm when hung to further reduce the incidence of non-cuticular transpiration. The wet weight of each leaf was then recorded every 45 min over a time period of 3 h, for a total of five measurements per leaf. Leaves were subsequently dried at 60°for 4 d in a forced-air oven, and afterward weighed.
To investigate the degree to which dry weight approximates surface area, we constructed an imaging table that consisted of a digital camera (Canon PowerShot S110) mounted at a fixed distance and folding table top to help flatten the adult maize leaves. In 2017, each experimental leaf that had completed the g c phenotyping process was flattened, imaged, and then oven-dried as earlier described and weighed. Each image contained 4 to 7 leaves, depending on their sizes. Next, images were individually analyzed with a custom ImageJ (Schneider et al. 2012) macro to estimate the surface area (mm 2 ) of each leaf. Even though every attempt was made to flatten the leaves, there were frequent occurrences of where wavy leaves could not be completely flattened. Very strong Pearson's correlations were observed between the plot-level averages of leaf dry weight and surface area in MA (r = 0.93) and SD (r = 0.92) in 2017 ( Figure S2; Table S2). Given these strong correlations and challenges in measuring surface area, leaf dry weight was considered to be a reasonable approximation of leaf surface area, and was used in the calculation of cuticular conductance as described below.
Adult leaf cuticular conductance (g c ) from unit surface area was calculated as follows: where b (gÁh -1 ) is the coefficient of the linear regression of leaf wet weight (g) on time (h), and dry weight (g) is an approximation of leaf surface area.
To screen the phenotypic data (flowering time or g c ) for significant outliers, univariate mixed linear models that enabled the estimation of genetic effects independent of field design effects were fitted as follows: (1) each single environment; (2) the two environments in each location; and (3) all four environments. The model terms included grand mean and check as fixed effects and environment, genotype, genotype-by-environment (G·E) interaction (only for Models 2 and 3), incomplete block within environment, and column within environment as random effects. The Studentized deleted residuals (Neter et al. 1996) generated from these mixed linear models were assessed and significant (a = 0.05) outliers removed. For each outlier screened phenotype, an iterative mixed linear model fitting procedure was conducted for each of the three (1 to 3) full models in ASReml-R version 3.0 (Gilmour et al. 2009). All random terms that were not significant at a = 0.05 in a likelihood ratio test (Littell et al. 2006) were removed from the model, allowing a final best-fit model to be obtained for each phenotype. These final models (Table S3) were used to generate a best linear unbiased predictor (BLUP) for g c and DTA for each line (Table S4).
Variance component estimates from the fitted full models (Table S5) were used to estimate heritability on a line-mean basis (Holland et al. 2003;Hung et al. 2012) for each phenotype in a location (Model 2) and across all four environments (Model 3). Standard errors of the heritability estimates were calculated with the delta method (Lynch et al. 1998;Holland et al. 2003). Pearson's correlation coefficients were used to evaluate the strength of correlation between BLUP values for g c and DTA in all pairwise combinations. The significance of correlations (a = 0.05) was tested using the function "cor.test" in R version 3.5.1 (R core team 2018).

DNA extraction and genotyping
For each inbred line, total genomic DNA was isolated from a leaf tissue sample consisting of bulked young leaves from a single plant. The leaf tissue samples were lyophilized and ground using a GenoGrinder (Spex SamplePrep, Metuchen, NJ, USA), followed by isolation of total genomic DNA using the DNeasy 96 Plant Kit (Qiagen Incorporated, Valencia, CA, USA). The genotyping-by-sequencing (GBS) procedure was conducted on the DNA samples following Elshire et al. (2011) with ApeKI as the restriction enzyme at the Cornell Biotechnology Resource Center (Cornell University, Ithaca, NY, USA). The constructed 192-plex GBS libraries were sequenced on a NextSeq 500 (Illumina Incorporated, San Diego, CA, USA).
We called genotypes at 955,690 high-confidence single-nucleotide polymorphism (SNP) loci in B73 RefGen_v2 coordinates following the method of Baseggio et al. (2019). Briefly, the raw SNP genotype calls were initially filtered to exclude singleton and doubleton SNPs (a minor allele observed in only a single line) and retain only biallelic SNPs having a SNP call rate greater than 10% and minimum inbreeding coefficient of 0.8 per Romay et al. (2013). Additionally, only lines with a call rate greater than 40% were retained. Missing SNP genotypes were partially imputed using FILLIN (Swarts et al. 2014) with a set of maize haplotype donors files having a 4 kb window size (AllZeaGBSv2.7impV5_AnonDonors4k.tar.gz, available at panzea.org). Given that missing genotype data still remained following partial imputation, the imputed genotype data set was further filtered to retain SNPs with a minimum call rate of 0.6, a minimum inbreeding coefficient of 0.8, and a minimum minor allele frequency of 5%. The final complete set contained 235,004 high-quality SNP markers scored on 451 lines that had a BLUP value for g c in one or more environments. The genome coordinates of the SNP loci were uplifted by aligning 101 bp context sequences containing target SNPs (6 50 bp) to the B73 RefGen_AGPv4 reference genome with Vmatch (Kurtz 2003).

Population structure analysis
We estimated population structure from the 451 line · 235,004 SNP genotype matrix in fastSTRUCTURE version 1.0 (Raj et al. 2014), with the number of ancestral populations (K) varying from 1 to 10 using the simple prior. To complement population structure inference with fastSTRUCTURE, a principal component analysis (PCA) was performed on the identical genotype matrix with the prcomp function in R version 3.5.1 (R core team 2018). Next, results from fastSTRUCTURE and PCA were jointly interpreted and complemented with pedigree information (Hansey et al. 2011) to choose K = 6 as the number of subpopulations. Additionally, lines were classified following group assignments of Hansey et al. (2011) (Figure S3A). In fastSTRUCTURE, the simple prior approach was used with K = 6 to calculate the subpopulation composition of each line ( Figure S3B). Lines with an assignment value of Q $ 0.50 were assigned to subpopulations (1, 2, 3, 4, 5, or 6), while those with assignment values of Q , 0.50 for all six subpopulations were classified as admixed ( Figure S3C; Table S6). This information was used to inform a stratified sampling approach used for whole-genome prediction (WGP).

Genome-wide association study
To identify associations between each of the 235,004 SNP markers and the BLUP values of g c from the 451 inbreds, a GWAS was conducted with a univariate mixed linear model that employed population parameters previously determined (P3D) approximation (Zhang et al. 2010) in the R package Genome Association and Prediction Integrated Tool (GAPIT) version 3.0 (Lipka et al. 2012). To control for phenotypic variation due to maturity, population structure, and unequal relatedness, the fitted mixed linear models included BLUPs of flowering time (DTA) of corresponding environments, principal components (PCs), and a genomic relationship (kinship) matrix. The kinship matrix based on the centered IBS method (Endelman and Jannink 2012) implemented in TASSEL 5.0 (Bradbury et al. 2007) was calculated from a subset of 41,259 SNPs remaining after linkage disequilibrium (LD) pruning (r 2 # 0.2) of the complete marker data set in PLINK version 1.09_beta5 (Purcell et al. 2007). The optimal number of covariates (BLUPs of DTA and PCs based on the genotypic matrix) to include in the mixed linear model was determined with the Bayesian information criterion (BIC) (Schwarz 1978). Prior to conducting the GWAS, the remaining missing genotypes for all SNP loci were conservatively imputed as a heterozygote in GAPIT. A likelihood-ratio-based R 2 statistic (R 2 LR ) (Sun et al. 2010) was used to estimate the amount of phenotypic variation explained by the model with or without a tested SNP. The false discovery rate (FDR) was controlled at 10% with the Benjamini-Hochberg procedure (Benjamini and Hochberg 1995).

Linkage disequilibrium
The squared allele-frequency correlation (r 2 ) method (Hill and Weir 1988) was used to estimate LD between all pairs of SNPs on a chromosome in TASSEL 5.0 (Bradbury et al. 2007). The complete set of 235,004 high-quality SNPs was used for LD estimation; however, the missing SNP genotypes remaining after FILLIN imputation were not imputed with a heterozygote value prior to LD analysis. The estimates of LD were used to approximate the physical distance at which LD decayed to genome-wide background levels (r 2 , 0.10) and assess the pattern of LD surrounding GWAS-detected SNPs.

Candidate gene identification
We searched for candidate genes within 6200 kb of the most significant SNP (peak SNP) for each detected genomic region associated with g c . Gene functional annotations (B73_RefGen_v4) of identified candidate genes were obtained from Gramene (http://www.gramene.org/). Additionally, the Gramene Mart tool was used to identify homologs of candidate genes in Arabidopsis thaliana (L.) Heynh.

Transcript abundance analysis for candidate genes
To investigate the patterns of transcript abundance for candidate genes potentially regulating g c of maize leaves, we analyzed the LM-RNAseq dataset of Qiao et al. (2019) preprint generated from epidermal and internal tissues that were LM from seven 2-cm-long intervals (six intervals from 2-14 cm, and one interval from 20-22 cm) from the expanding leaf 8 of maize inbred B73 with three biological replicates and three plants per replicate. The generated RNAseq reads were aligned to B73 genome RefGen_v4 with HiSAT2 (Kim et al. 2015) and counted with HTSeq (Anders et al. 2015). Transcript abundance levels for canonical protein coding genes were characterized as counts per million (cpm) after normalizing against library sizes with the R package edgeR 3.3.2 (Robinson et al. 2010). To identify potential outlier samples, genes with .1 cpm in two or more samples were retained. Next, a PCA was conducted on the cpm values of all the retained genes on all samples sequenced ( Figure S4), and the epidermal and internal samples from interval 4-6 cm in the 3 rd replicate were removed as outliers (circled samples in Figure S4). With the remaining epidermis samples, a gene was declared expressed in the epidermis if its cpm was greater than one for at least three out of the 20 samples.
Whole-genome prediction WGP of g c was conducted using a maximum likelihood approach to ridge regression in the 'rrBLUP' R package (Endelman 2011). The final complete set of 235,004 SNPs with post-FILLIN missing genotype data imputed as a heterozygote was used for WGP. We trained WGP models with g c BLUP values from either MA (two environments), SD (two environments), or AllEnv (four environments), then each of the three models were separately used to predict g c for MA, SD, and AllEnv. Collectively, this resulted in a total of nine prediction scenarios. Fivefold cross-validation as described in Owens et al. (2014) was performed to evaluate the predictive ability for g c in each of the nine scenarios by calculating the Pearson's correlation between observed BLUP values and genomic estimated breeding values. We used a stratified sampling approach to account for the presence of population structure, enabling each fold to have the proportion of subpopulations (1 to 6 and admixed) observed for the whole panel (Table S6). The procedure was conducted 50 times for each scenario, and the predictive ability was calculated as a mean of the correlations.

Data availability
The raw GBS sequence data were deposited in the National Center of Biotechnology

Phenotypic variability
To establish the feasibility of GWAS for g c , the extent of phenotypic variation for adult leaf g c was assessed in the Wisconsin Diversity panel of more than 450 maize inbred lines that was grown in two field locations (SD, San Diego, CA; and MA, Maricopa, AZ) in 2016 and 2017. The climatically contrasting environments of the SD and MA locations (Table S1) had the lowest and highest average g c estimates, respectively (Table 1). There was a moderately strong correlation for g c between years in a location (r = 0.52 and 0.54), whereas correlations were slightly weaker (r = 0.36 to 0.48) between locations in and across years ( Figure S5). With respect to flowering time, g c tended to have a moderately weak negative correlation with days to anthesis (DTA) in each environment (r = -0.14 to -0.50), with the exception of a negative correlation of moderate strength (-0.50) in the MA17 environment ( Figure S5). Comparatively, the first two principal components (PCs) but not the third PC calculated from the SNP genotype matrix (PC1, 3.68%; PC2, 1.83%; and PC3, 1.62%), which infers population structure from genetic data, showed comparable correlations with g c across all four environments (|r| = 0.14 to 0.31). Heritability on a line-mean basis across (AllEnv) locations was 0.71, with a range of 0.63 to 0.67 in locations (Table 1). These findings suggest that most phenotypic variation for g c is due to the genetic variation among individuals in this diversity panel.

Genome-wide association study
The genetic basis of natural variation for g c was examined in the Wisconsin Diversity panel that had been genotyped with 235,004 high-quality SNP markers. We conducted GWAS in (MA or SD) and across locations (AllEnv) using a mixed linear model that controls for population structure (PCs) and unequal relatedness (kinship). In addition, flowering time, which was found to have high heritability (0.88 to 0.93) for MA, SD, and AllEnv (Table S7) and mostly moderate weak correlations with g c (Figure S5), was retained as a covariate for GWAS of g c from MA and AllEnv based on the BIC values, thus attenuating the confounding effect of maturity when estimating allelic effects. A total of nine unique SNPs localized to five different genomic regions were significantly associated with g c from MA and/or AllEnv at a genome-wide FDR of 10% (Figures 1 and S6). Of these nine SNPs, two were associated with g c in both MA and AllEnv, for a total of 11 significant marker-trait associations (Table 2). No significant associations were detected for SNP markers with g c from SD at the 10% FDR level ( Figure S6).
In the Wisconsin Diversity panel, genome-wide LD decayed to background levels (r 2 , 0.1) by 213 kb at the 90% percentile of the r 2 distribution ( Figure S7). This percentile cutoff of the distribution was used to sample the large variance in LD structure presumably caused by rare variants (Wallace et al. 2014a) and allow for the inclusion of potential distant regulatory elements as has been observed for cloned quantitative trait loci (QTL) in maize (Salvi et al. 2007;Studer et al. 2011). Therefore, the genomic search space to identify candidate genes was restricted to 6 200 kb window of the five peak SNPs that each tagged a unique genomic interval, resulting in the inclusion of 76 annotated genes (Table S8). Of these 76 genes, 57 are canonical protein coding genes, while the other 19 encode non-coding RNAs.
The strongest association signal was identified on chromosome 4 (Figure 1), with SNPs S4_30201436 and S4_30201599 comprising the peak associations with g c in MA (P-value 2.68 · 10 27 ) and AllEnv (P-value 1.04 · 10 26 ), respectively. The peak SNP S4_30201599 resides within a microRNA locus (zma-MIR2275a) and is located 1,146 bp from a gene (Zm00001d049479) encoding an INCREASED SALT TOLERANCE1-LIKE1 (ISTL1) protein ( Figure 2; Table S8). ISTL1 is the most probable candidate gene underlying this association signal because it is predicted to function together with the endosomal sorting complex required for transport machinery (ESCRT) and ATPase VACUOLAR PROTEIN SORTING4 (VPS4) in the formation of multivesicular bodies (MVBs) (Hill and Babst 2012;Buono et al. 2016).
The genomic region associated with g c on chromosome 1 was detected in MA, with the signal peak defined by SNP S1_27055151534 (P-value 4.37 · 10 27 ) ( Table 2). The peak SNP is located 8,190 bp from the nearest gene (Zm00001d033835), but this gene encodes an uncharacterized protein. Two genes adjacent to this one encode closely related SEC14 homologs, separated by about 50 kb (Zm00001d033836, $50 kb from peak SNP; and Zm00001d033837, $100 kb from peak SNP) ( Figure S8; Table S8). SEC14 proteins in plants and other eukaryotes function in the transfer of phosphoinositides between different cellular membranes (Huang et al. 2016) and are noteworthy candidates for regulators of g c because of their function in stimulating vesicle formation from the trans Golgi network in yeast (Kf de Campos and Schaaf 2017), given the role of Golgi-dependent vesicle trafficking in cuticle biosynthesis in plants ( McFarlane et al. 2014). Additional plausible candidates on chromosome 1 encode a homolog of ECERIFERUM7 (CER7) (Zm00001d033842) and a CYCLASE-ASSOCIATED PROTEIN (CAP) (Zm00001d033830) located (+) 179,198 bp and (-) 152,746 bp from the peak SNP, respectively ( Figure S8). CER7, a 39-to-59 exoribonuclease family protein, is a core subunit of the RNA degrading exosome that positively regulates the expression of CER3/WAX2/YRE, a key wax biosynthesis gene in Arabidopsis (Kurata et al. 2003;Chen et al. 2003). CAP, a regulator of actin cytoskeleton dynamics, is highly n■ Table 1 Averages and ranges for best linear unbiased predictions (BLUPs) of adult maize leaf cuticular conductance (g c ) in a location (MA and SD) and across all four environments (AllEnv), and estimated heritabilities on a line-mean basis conserved in plants, yeast, flies, and mammals (Balcer et al. 2003;Ono 2013).
An additional association with g c was detected in MA on chromosome 8, defined by S8_152967952 as the peak SNP (P-value 1.92 · 10 26 ) along with two neighboring SNPs (S8_152967929 and S8_152967966). The closest gene (Zm00001d011655) is located 4,067 bp from the peak SNP, and it encodes a MITOGEN-ACTIVATED PROTEIN KINASE KINASE KINASE18. A gene encoding a GLY-ASP-SER-LEU ESTERASE/LIPASE (GDSL lipase) (Zm00001d011661) was identified 170,643 bp from the peak SNP ( Figure S9; Table S8), and it was identified as the most plausible candidate in this region based on the known functions of GDSL lipases in lipid biosynthesis (Girard et al. 2012;Yeats et al. 2012b).
Finally, in AllEnv, g c was also significantly associated with genomic regions on chromosomes 7 and 10 with peak SNPs S2_231152916 (P-value 7.88 · 10 27 ) and S10_144686998 (P-value 1.08 · 10 26 ), respectively. The peak SNP on chromosome 7 is located within a gene encoding CYTOKININ-N-GLUCOSYLTRANSFERASE1 (Zm00001d019250) ( Figure S10; Table S8). The peak SNP on chromosome 10 is located within a gene that encodes b-D-XYLOSIDASE 4 (Zm00001d026415), a xylan-degrading enzyme ( Figure S11; Table S8). The latter is a plausible candidate given that the outer epidermal cell wall and cuticle are closely connected and polysaccharides, for example, pectins and hemicelluloses like xylan or xyloglucans, can be found embedded in some parts of the cuticle (Yeats and Rose 2013; Hama et al. 2019). Therefore, sugar-or cell wall-modifying enzymes might directly or indirectly influence cuticle composition and permeability.

Transcript abundance analysis of identified candidate genes
We analyzed the transcript abundance of the 57 candidate genes that encode a protein to help further prioritize which of them may be Figure 1 Manhattan plot of results from a genome-wide association study (GWAS) of adult maize leaf cuticular conductance (g c ) conducted in Maricopa (MA) and across all four environments (AllEnv). The -log 10 P-value of each SNP tested in a mixed linear model analysis of g c is plotted as a point against its physical position (B73 RefGen_v4) for the 10 chromosomes of maize. The least significant single-nucleotide polymorphism (SNP) at a genome-wide false discovery rate of 10% in MA and AllEnv is indicated by a dashed horizontal orange line and a dotted horizontal orange line, respectively. SNPs significantly associated with g c in MA and AllEnv are represented by blue circles and red triangles, respectively. The most plausible candidate genes within 6 200 kb of the significantly associated SNPs are listed above their corresponding GWAS signal.
n■ Table 2 Significant SNPs detected at false discovery rate (FDR) of 10% through a genome-wide association study of adult maize leaf cuticular conductance (g c ) in a location (Maricopa, MA; San Diego, SD) and across all four environments (AllEnv) involved in the genetic control of g c . Given that the leaf cuticle is produced by epidermal cells, we analyzed LM-RNAseq data of epidermal cells from seven 2-cm intervals excised from the expanding leaf 8 of maize inbred B73 (Qiao et al. 2019 preprint), representing sequential stages in cuticle maturation (Bourgault et al. 2020). Of the 57 candidate genes, 24 were found to be expressed in at least one of the seven sampled intervals ( Figure S12). Of the seven promising candidate genes contributing to the genetic control of g c through potential influence on cuticle development, those encoding CAP, ISTL1 protein, GDSL lipase, b-D-XYLOSIDASE 4, and the two SEC14-like proteins were found to be epidermally expressed (Figure 3), while the homolog of CER7 was not declared to be expressed in the leaf epidermis based on its transcript abundance (Table S9). In general, transcript abundance increased for the ISTL1 protein and GDSL lipase along the leaf developmental gradient from the base (youngest, 2-4 cm) toward the tip (oldest, 20-22 cm) of the leaf, with the exception of a reduced transcript abundance for GDSL lipase in the 20-22 cm interval, at which time cuticle maturation is already completed (Bourgault et al. 2020). In contrast, CAP showed the opposite trend, with decreases in transcript abundance across the gradient from 2-4 cm to 20-22 cm intervals. The two genes encoding the SEC14-like proteins, Zm00001d033836 and Zm00001d033837, had relatively constant transcript abundances from the base to the tip, with a slightly increased abundance at the 20-22 cm interval for Zm00001d033837. The gene encoding b-D-XYLOSIDASE 4 had a peak abundance at the 6-8 cm interval, followed by a progressive decrease in abundance toward the leaf tip. These observed transcript abundance patterns are compatible with roles in cuticle maturation.

SNP ID
Whole-genome prediction of g c Whole-genome prediction (WGP) was performed with the complete marker data set of 235,004 SNPs to assess the feasibility of implementing genomic selection for the genetic improvement of g c in maize breeding populations. Predictive ability was evaluated on g c from MA, SD, and AllEnv in a scheme that used each of the three as a training set. The overall predictive ability was 0.48 across all of the nine trainingprediction combinations, with individual predictive abilities ranging from 0.40 to 0.54 (Figure 4; Table S10). Not unexpectedly, the highest average predictive ability (0.52) was achieved when AllEnv was the training set, followed by MA (0.46) and SD (0.45). Indicative of contrasting environmental conditions between field locations, MA and SD showed lower predictive abilities for each other compared to when MA or SD was used to predict itself. However, AllEnv as the training set resulted in more similar prediction abilities for g c in MA (0.53) and SD (0.49).

DISCUSSION
The cuticle, a crucial barrier to plant water loss (Shepherd and Wynne Griffiths 2006;Xue et al. 2017), has been widely studied using single gene loss-and gain-of-function mutants, and interspecies crosses (Kosma and Jenks 2007;Yeats et al. 2012a;Beisson et al. 2012;Yeats and Rose 2013). However, no prior study has explored natural variation in a single species to elucidate the genetic control of cuticle function as a water barrier on a genome-wide scale. In this study, we focused on leaf cuticular conductance (g c ), a trait of potential agronomic value related to drought tolerance. To that end, we evaluated the phenotypic variation of g c using detached adult leaves in a maize diversity panel in two climatically contrasting field locations (Table S1) and conducted a GWAS and WGP to explore the genetic architecture and the potential genetic gains that could be expected under selection in a breeding program for g c .
Analysis was performed to evaluate the repeatability of g c in and across locations, as well as how g c related to flowering time (DTA). Moderately strong correlations (r) of g c measurements were observed between years in a location, while relatively weaker correlations existed between locations ( Figure S5). Given the lower relative humidity and higher solar radiation and air temperature of MA compared to SD during maize growing seasons (Table S1), the weaker correlative patterns between locations imply that g c is moderately responsive to environmental conditions. Noticeably, weak to moderately strong negative correlations (MA16, -0.33; MA17, -0.50; SD16, -0.14; SD17, -0.20) existed between g c and flowering time collected from the same year in each location. These findings suggest low to moderate levels of confounding between g c and maturity depending on the environment, with the prolonged exposure of the developing cuticle of plants to more extreme weather variables possibly explaining the stronger correlations in the MA field location. Even though these results implicated the need to assess for the confounding effect of flowering time when conducting GWAS, the $twofold range in g c variation, calculated as the ratio between the maximum and minimum BLUP values, in (MA and SD) and across (AllEnv) locations, and the moderately high heritabilities of g c (Table 1) indicate that this phenotype should be very amenable to genetic dissection and prediction in maize.
We conducted a GWAS that resulted in the identification of five genomic regions associated with g c in the Wisconsin Diversity panel. Of these five genomic regions, two were detected only in MA (chromosomes 1 and 8), two in AllEnv (chromosomes 7 and 10), and one (chromosome 4) was detected in both MA and AllEnv (Figure 1; Table 2). Notably, no significant associations at the 10% FDR level were detected only in SD, implying that G·E exists for g c as shown by the between-location correlation analysis ( Figure S5). This is also further supported by comparing the ratio of genetic variance to G·E variance, which was 3.95, 2.57, and 1.81 for SD, MA, and AllEnv, respectively (Table S5). All together, these observations are consistent with those of previous studies showing that both the content and composition of cuticular waxes can be influenced by environmental factors such as ultraviolet (UV) radiation, temperature, humidity, and drought stress (Shepherd and Wynne Griffiths 2006;Xue et al. 2017). It has been reported that drought stress and low humidity increased wax loads in many plant species (Sutter and Langhans 1982;Shepherd and Wynne Griffiths 2006;Koch et al. 2006;Kosma and Jenks 2007), but a different combination of UV intensity and temperature resulted in more complex patterns of wax products (Baker 1974;Riederer and Schneider 1990;Shepherd et al. 1997). Although it is not straightforward to explain how each environmental factor affected g c through wax biosynthesis and deposition, the differences in GWAS results in MA and SD suggests that environment-specific factors in MA such as high temperature, low humidity, and elevated UV radiation (Table S1) might be inducing compositional features that affect water loss relative to plants in SD.
As the major barrier of water vapor diffusion from the intercellular air space to the atmosphere when stomata are closed, the composition and structure of the cuticle can affect water evaporation resistance, thus causing variation in g c . The candidate genes we propose for g c via GWAS encode proteins with functions that can be related to cuticle formation.
Three of the candidate genes implicated by our GWAS in control of g c encode proteins with likely functions in membrane trafficking, a process needed for deposition of cuticle lipids at the cell surface as well as for delivery of lipid transport proteins to the plasma membrane (McFarlane et al. 2014). Two adjacent genes, likely resulting from a recent tandem duplication, encode closely related SEC14 homologs. These belong to a subset of SEC14 homologs in maize lacking the GOLD (Golgi dynamics) and nodulin domains present in SEC14s that have been functionally characterized in plants (Huang et al. 2016;Kf de Campos and Schaaf 2017). Nevertheless, SEC14 domains are well known to function in movement of phosphoinositides (signaling lipids) between cellular membranes; in yeast this function stimulates vesicle formation from the trans-Golgi network (Kf de Campos and Schaaf 2017). Thus, the maize SEC14 proteins we identified may function in vesicle trafficking as well.
Another membrane-trafficking-related candidate encodes an IST1 family protein closely related to Arabidopsis ISTL1 (Buono et al. 2016). Yeast IST1 functions with the ATPase VPS4 to regulate the assembly and function of ESCRTIII complexes, which facilitate the formation of vesicles inside MVBs (Hill and Babst 2012). MVBs are late endosomal compartments that function in degradation of proteins retrieved via endocytosis from the plasma membrane. A function for ISTL1 proteins in degradation of plasma membrane proteins and formation of MVBs in Arabidopsis has been demonstrated (Buono et al. 2016). Thus, maize ISTL1 could function in an endosomal pathway critical for maintenance of plasma membrane-associated proteins that deliver cuticle lipids to the extracellular environment such as ABCG transporters and lipid transfer proteins. Moreover, MVB-like organelles have been implicated as the source of extracellular vesicles that deliver certain proteins and small RNAs to the extracellular environment in plants (Rutter and Innes 2018). This suggests the additional possibility that maize ISTL1 could impact g c because of a role in the formation of MVB vesicles containing cuticle lipids or GDSL lipases required for cutin polymerization, which may be released into the extracellular environment via fusion of MVBs with the plasma membrane. Confirmation of a role for MVBs in cuticle formation would reveal new aspects of membrane trafficking supporting cuticle formation.
CYCLASE-ASSOCIATED PROTEIN1, encoded by the candidate gene on chromosome 1 near peak SNP S1_270551534, has not been associated with cuticle development in previous studies. CAPs are actin monomer-binding proteins that regulate actin dynamics (Qualmann et al. 2000;Hubberstey and Mottillo 2002). Functions for CAPs in actin regulation and actin-dependent processes in Arabidopsis have been demonstrated through genetic and biochemical studies (Barrero et al. 2002;Chaudhry et al. 2007;Deeks et al. 2007). Actin dynamics have been implicated in multiple aspects of membrane trafficking in eukaryotic cells including vesicle formation and vesicle movement (Lanzetti 2007). Thus, CAP could impact g c in maize via regulation of membrane trafficking events supporting cuticle formation such Two of the candidate genes we identified have predicted functions in cuticle formation. One of these is CER7, which functions in Arabidopsis to regulate the biosynthesis of alkanes, one of the major classes of cuticle waxes in this species  and also in maize adult leaves (Bourgault et al. 2020). CER7 encodes an exosomal 39-to-59 exoribonuclease that regulates the expression level of CER3 (Hooker et al. 2007). The CER3 protein acts together with CER1 to catalyze the formation of alkanes (Bernard et al. 2012). The other candidate with a predicted function in cuticle formation encodes a GDSL lipase (Zm00001d011661). GDSL lipases catalyze the hydrolysis of mono-, di-and triglycerols and release free fatty acids and alcohols (Angkawidjaja and Kanaya 2006). Although GDSL lipases are involved in a large number of biological processes in plants, their function in the biosynthesis of cutin is well-recognized (Takahashi et al. 2010;Girard et al. 2012;Yeats et al. 2012b). Moreover, a homolog of Zm00001d011661 in Arabidopsis, AT3G16370, was down-regulated along with other cuticle associated genes in a transgenic DESPERADO/ABCG11 silenced line with changes in levels of cutin monomers (Panikashvili et al. 2010), whereas its homolog in Citrus clementina was reported as the top differentially expressed gene in the fruit epidermis (Matas et al. 2010).
The final candidate gene identified encodes the cell wall-modifying enzyme b-D-XYLOSIDASE 4 (Zm00001d026415). Sometimes the cuticle is described as a specialized lipidic modification of the cell wall, and polysaccharides are known to be deposited in some regions of the cuticle (Yeats and Rose 2013). While compounds like pectins are found to be confined to the cuticular layer close to the cell wall-cuticle border, transmission electron microscopy and nondestructive polarization modulation-infrared reflection-absorption spectroscopy localized hemicelluloses such as xylan and xyloglucan toward the surface of the cuticle, within the cuticle proper of several species (Guzmán et al. 2014;Hama et al. 2019). Therefore, we speculate that cell wall-or sugar-modifying enzymes such as a b-D-xylosidase could have an influence on the polysaccharide content of the cuticle, or might impact cuticle composition or organization by affecting the transit of cuticle lipids across the wall to the cuticle.
Genes encoding CAP, ISTL1, GDSL lipase, b-D-XYLOSIDASE 4, and both SEC14 homologs were declared expressed in epidermal cells of developing maize leaves (Figure 3), where the cuticle matures, indicating that these genes are involved in metabolic and other cellular processes in maize leaf epidermal cells and possibly associated with cuticle development. The abundance of the CER7 homolog transcript, a core subunit of the exosome that regulates wax biosynthesis (Hooker et al. 2007), was too low to be declared expressed, with cpm values ranging from 0.033 to 0.137 in 6 out of the 20 samples. Similarly, very low abundances of this transcript were observed by RNAseq in a number of different tissues from maize inbred B73 (Stelpflug et al. 2016). The increasing/decreasing trends in transcript abundance along the developmental gradient ( Figure 3) are consistent with regulatory roles in cuticle maturation; however, these trends are equally consistent with roles in other developmental changes taking place at the same time. Additional work will be needed to definitively link changes in expression levels of these genes with variation in g c .
In our GWAS results, even after controlling for the confounding impact of flowering time, each locus significantly associated with g c explained 4-6% (R 2 LR-SNP -R 2 LR ) of the variation in the panel (Table 2), thus only accounting for a minor fraction of the estimated heritability. If the genetic architecture of g c is predominated by a large number of loci with small allelic effect sizes, then it is possible that our association population was not of a sample size sufficient to offer the statistical power needed to detect functional variants with small effects (Long and Langley 1999). With only a total of 235,004 SNPs with common minor allele frequencies (5% or greater) scored on the diversity panel, the 'missing heritability' could be also partially attributed to not having a density of SNP markers required to achieve strong LD (r 2 . 0.80) with unscored causal variants that are potentially rare in frequency (Myles et al. 2009;Buckler et al. 2009). Therefore, increasing SNP marker density and the mapping population size are likely important for explaining the missing heritability Figure 4 Predictive abilities of whole-genome prediction for adult maize leaf cuticular conductance (g c ) in locations (Maricopa, MA; and San Diego, SD) and across all four environments (AllEnv) in a scheme that used each of the three as a training set. Gray, orange and blue represent the predictive abilities for g c in MA, SD, and AllEnv, respectively. of g c , which is possibly controlled by a large number of small-effect alleles in maize considering the complex cuticle biosynthesis network and transport mechanisms (Yeats and Rose 2013), as well as factors that affect leaf water trafficking. Such a hypothesis would cohere with findings from the powerful maize nested association mapping panel where genetic models consisting of many loci with small additive effects explain a majority of the genetic variance for a number of complex traits (Wallace et al. 2014a). However, we still cannot exclude the possible importance of rare variants of large effects, genotype-by-environment interaction, epigenetics, epistasis, incomplete penetrance, and other genetic mechanisms (Gibson 2012;Wallace et al. 2014b) in the genetic control of g c in the Wisconsin Diversity panel.
Genomics-assisted breeding approaches for improving plant water-use efficiency and drought tolerance include the use of genetic markers to select for increased water uptake through roots and diminished water loss through stomata and cuticles (Ruggiero et al. 2017). Since a lowered g c provides additional protection to the plant after stomatal closure under water-limited conditions, this potentially genetically complex trait is a candidate for genomic selection, enabled by WGP, in maize breeding programs with a focus on rainfed production environments that are prone to low rainfall. In the WGP analyses that we performed, the predictive abilities for g c were moderately high for the tested nine training-prediction combinations ( Figure 4; Table S10), indicating that genetic gain for g c could be accelerated with WGP in maize breeding populations. Presumably due to influence of contrasting environmental conditions on the phenotypic data collected at the two field locations ( Figure S5; Tables S1 and S4), the prediction abilities were lowest when the MA and SD training sets predicted g c each other (Figure 4; Table S10). However, AllEnv showed similar predictive abilities for MA and SD as for themselves, because AllEnv contained phenotypic information from both locations. Given the range in phenotypic variation observed for g c across locations, it is highly recommended that phenotyping and selection occur in the specific target breeding environments, especially if the focus is on hot, arid environments such as MA. Furthermore, if g c does ultimately have a polygenic genetic basis, genomic selection would be the optimal breeding strategy for g c in maize breeding populations rather than a marker-assisted selection approach designed for only a few loci with large-effects (Meuwissen et al. 2001;Lorenz et al. 2011).