Extending the Genotype in Brachypodium by Including DNA Methylation Reveals a Joint Contribution with Genetics on Adaptive Traits

Epigenomic changes have been considered a potential missing link underlying phenotypic variation in quantitative traits but is potentially confounded with the underlying DNA sequence variation. Although the concept of epigenetic inheritance has been discussed in depth, there have been few studies attempting to directly dissect the amount of epigenomic variation within inbred natural populations while also accounting for genetic diversity. By using known genetic relationships between Brachypodium lines, multiple sets of nearly identical accession families were selected for phenotypic studies and DNA methylome profiling to investigate the dual role of (epi)genetics under simulated natural seasonal climate conditions. Despite reduced genetic diversity, appreciable phenotypic variation was still observable in the measured traits (height, leaf width and length, tiller count, flowering time, ear count) between as well as within the inbred accessions. However, with reduced genetic diversity there was diminished variation in DNA methylation within families. Mixed-effects linear modeling revealed large genetic differences between families and a minor contribution of DNA methylation variation on phenotypic variation in select traits. Taken together, this analysis suggests a limited but significant contribution of DNA methylation toward heritable phenotypic variation relative to genetic differences.

To date, there are few cases showing a direct relationship of variable DNA methylation tied to downstream phenotypes (Ong-Abdullah et al. 2015;He et al. 2018). In addition, changes in DNA methylation are often confounded with genetic variation present between samples. That is, variable DNA methylation is clearly apparent in cases where genetic variation exists and it has been difficult to disentangle these when associated with a phenotype of interest (Eichten et al. 2014). Many studies address this by examining DNA methylation variation through inbreeding generations, so called epi-Recombinant Inbred Lines (epiRILs) or mutation accumulation lines, which display phenotypic variation (Shaw et al. 2000;Becker et al. 2011;Schmitz et al. 2011;Cortijo et al. 2014). However, such estimates of the contribution of DNA methylation on phenotypic differences are made in isolation. Therefore, a more holistic approach in which DNA methylation variation is assessed along with genetic polymorphisms and environmental variation to reveal the relative importance of these heritable factors. This can be viewed as a part of the 'Extended Genotype' of an organism, that is, sources of heritable variation that are largely overlooked and/or misinterpreted in relation to more traditional genotype assessments such as SNPs (Eichten et al. 2014).
The model grass Brachypodium distachyon (B. distachyon) provides an ideal system to examine the impact of the extended genotype on natural populations. Advantages include having a relatively small and fully sequenced genome alongside a growing array of genetic resources, displaying a wide climatic distribution resulting in phenotypic diversity, rapid generation times leading to increased rounds for (epi)genetic selection, and a small stature that facilitates systematic study. Pertinently, multiple genetically similar accessions of Brachypodium have been identified in different environments across Turkey (Vogel et al. 2009;Wilson et al. 2019). This set of 'BdTR' accessions provide a unique natural set of germplasm to investigate the impact of 'extended genotype' signals, such as DNA methylation.

MATERIALS AND METHODS
Plant germplasm, growth, and phenotyping Germplasm for experiments was selected based on previously established SNP-based genotypic relationships as described in (Wilson et al. 2019). Briefly, the genetic distance matrix, as defined for all samples, was used to identify clades of individuals with minimal genetic variation (similar to that observed between technical replicates) and for which seed was available. The 95 selected B. distachyon accessions were grown in biological triplicate alongside 17 biological replicates of the reference genotype Bd21 for direct comparison to an inbred background (Initiative and The International Brachypodium Initiative 2010). Two seeds per pot were planted 2.5 cm below surface in 5x5x8 cm pots in a steam-pasteurized 75:25 martin's soil:washed river sand mix. Trays of 14-16 pots were watered with tap water, covered in plastic film, and placed at 4°in the dark for seven days for seed stratification. Trays were subsequently moved into modified Conviron growth chambers which have been fitted with 7-band LED light panels and control light intensity, quality, chamber temperature, and humidity every 5 min (Brown et al. 2014). These were planted under simulated conditions for regions near Istanbul, Turkey season-shifted for planting in both northern-hemisphere spring (April) and fall (August) to investigate how plants respond to either a rapid-cycling or overwintering environment. A pair of identical specialized growth chambers were used to simulate these Turkish climates (41.146, 29.026, 72m elevation) modeled using SolarCalc (version E-2014) (Spokas and Forcella 2006). The chambers updated temperature and 7-band LED light quality information every 5 min (Sup Figure 1) (Brown et al. 2014). Spring started with average high temps of 17°and reached a peak summer daily high at 29°. Night time lows ranged from a season start of 10°to 21°by the peak of summer (simulated July). In contrast, fall planting started with average high temps of 19°and reached a minimum winter daily high at 7°. Night lows ranged from a season start of 14°to 5°(limited by chamber specifications) by mid-winter (simulated February). The two chambers performed the same modeled conditions offset by six months. This allowed for a 'spring' and 'fall' planting in which plants entered the simulated conditions on April 26 th and October 16 th . Plants were regularly watered using tap water over the course of the experiment when standing water was not observed in trays. All plant measurements were taken by hand, including plant height, third leaf length and width, tiller count, ear count, and flowering time (Sup Table 1) (Rebetzke and Richards 1999;Wilson et al. 2015Wilson et al. , 2019. Tissue harvest and DNA extraction Spring planting flag leaf tissue was harvested 10 weeks after emergence. In contrast, fall planting leaf 3 and leaf 7 tissue was harvested 14 weeks after emergence due to limited overwinter growth (Sup Table 2). Tissue was harvested directly into 96 well plates on liquid nitrogen for DNA extraction. Leaf tissue was ground using the TissueLyser II (Qiagen) and DNA was extracted using the Invisorb DNA Plant HTS 96 Kit (cat 7037300400; Stratec Biomedical) using the manufacturer's protocol. DNA was quantified using the Quant-iT High Sensitivity dsDNA Assay (cat. Q33120; ThermoFisher).

Whole genome bisulfite sequencing and analysis
Whole genome bisulfite sequencing libraries were created using the Accel-NGS Methyl-Seq DNA Library Kit (cat. 30096; Swift Biosciences; Ann Arbor, MI). The standard protocol was modified for third-reaction volumes throughout using 27 ng bisulfite converted gDNA. Initial shearing step of gDNA was omitted. Bisulfite conversion was conducted using the EZ-96 DNA Methylation-Gold MagPrep kit (cat. D5043; Zymo Research). 80 ng of gDNA in 45 ml H 2 O was used for low concentration gDNA. Dual indexing of samples was performed with the Methyl-Seq Dual Indexing Kit (cat. 38096; Swift Bioscience) using 11 PCR cycles.The library underwent a final dual-side SPRI cleanup upon completion of the library preparation (0.6x right-side SPRI followed with 0.85x left-side SPRI) to compensate for the lack of physical shearing of the initial DNA. Libraries were quantified using the Caliper LabChip GXII (PerkinElmer) and equalmolar pooling of 96 libraries was performed. Pools of 96 libraries were sequenced using the HiSeq 2500 (Illumina). The subset of samples selected for high-coverage methylome analysis were run as a pool of 96 samples across a flow cell (8 lanes) on the HiSeq 2000. All sequencing was conducted at the ANU Biomolecular Resource Facility.
Raw sequencing reads had 59 trimming of 6 bp to eliminate library bias in methylation state, along with base quality and adapter trimming using Trim Galore!. Trimmed reads were mapped to the Bd21 reference (v2.1) using Bismark (v0.13.0) under bowtie1 alignment mode (Krueger and Andrews 2011). Alignments were subsequently deduplicated and all context (CG/CHG/CHH) methylation data were extracted via bismark_methylation_extractor using default parameters.
Differentially methylated regions (DMRs) were called using HOME (Srivastava et al. 2019). HOME-pairwise module was implemented with the following cutoffs: methylation difference $ 20%, number of CGs $ 4 and DMR length $ 50 bp (parameters: -delta 0.2, -minc 4, and -len 50). Briefly, HOME first computes the P-values using weighted logistic regression to model methylation levels and variance between accessions and replicates. Here, the weighted logistic regression uses coverage as a weight. Thereafter, the difference in methylation level is weighted by the p-values to compute the DMRs. Moreover, the spatial correlation present among neighboring cytosine sites is captured by moving average smoothing and the use of weighted voting for histogram based features used by HOME.

Dissection of phenotypic variance
We used linear mixed modeling to estimate the amount of phenotypic variance that can be explained by the differences in CG methylation between accessions while accounting for the genetic effects due to differences between the clone groups. Analyses were performed for each trait and environment separately and was fitted using the R package "Asreml-R". Initially, we construct a reduced model with one random effect:

y¼ XmþZ g vþe
Where y is a vector of phenotypes, m is the population mean, n is the vector of breeding values treated as a random effect with a N(0, s 2 G) distribution, and e is the vector of residual effects. Z g is the design matrix allocating clone groups to individual plants and has been defined as the inverse of the genomic relatedness matrix G. The genomic relatedness matrix was estimated following (VanRaden 2008) that has been adjusted for almost completely homozygous organisms. First, from whole-genome sequencing data from one accession of each clone group, we build the matrix X with the dimensions of number of clone groups (n = 7) and number of SNPs (m = 510,230). SNPs with at least one copy of the minor allele and no more than one missing value were included in X. X was rescaled to account for allele frequencies to produce W where W ij = M ij -p j , where p j is the allele frequency for SNP j. Finally, G is calculated as G = WW'/ P p j (1-p j ). We then construct a more complicated model with an added random effect to capture the variance due to differences between the accessions in CG methylation states: Here, r is a vector of fitted values (N(0, s 2 M) that we interpret as the methylation equivalent of breeding values for which the  "methylation heritability" can be estimated from the variance in these values. Z m is the design matrix allocating accessions to plants defined by the inverse of a methylation similarity matrix described below.
We pooled the reads from bisulfite DNA sequencing of the replicates of each accession separately for each environment. For each accession, we scored the proportion of methylated reads compared to the total number of reads at all CG sites. We further pooled CG sites into 200bp windows. The proportion of methylated reads for a specific window was treated as missing data if an accession had less than 10 reads total. Windows with more than 50% of accessions containing missing values were excluded from further analysis, additionally, windows that did not contain a single methylated read were also excluded. In total, 838,231 windows were retained in the spring conditions and 778,183 windows in the fall conditions.
After filtering, we retained a matrix of proportions of methylated reads, Q, with the dimensions of accessions (n = 83) by the number of 200bp windows. Columns of Q were scaled to have a mean of zero and unity of variance. The methylation similarity matrix was then calculated as M = QQ'/N, where N is a matrix of pairwise number of windows with non-missing data between two accessions.
We used a log likelihood ratio test to determine if the methylation state of the accessions explained a statistically significant amount of phenotypic variance. Here the statistic of 2 times the difference in log likelihood between the models was tested against a chi-squared distribution with one degree of freedom. All variance components were mean-standardized following (Houle 1992).

Selection of Brachypodium accessions and genetic profiling
Previous genomic analyses of thousands of Brachypodium accessions was conducted using genotype-by-sequencing (GBS) culminating in a selection of diverse B. distachyon germplasm as a HapMap set (Wilson et al. 2019). In that study it was observed that some Turkish accessions were highly similar based on SNP assessment, but were located in different geographical regions ( Figure 1A). These 83 Turkish accessions (BdTR) were grouped into seven nearly genetically identical 'families' based on prior analysis (Sup Table 1). This was generally consistent with the accession's initial naming based on phenotypic similarity (Vogel et al. 2009), though BdTR1 and BdTR2 belonged to the same genotype family. The BdTR accessions within each genetic family were widely dispersed geographically and found at different elevations ( Figure 1B) consistent with wide migration and little recombination of the highly selfing species.
As a genetically distinct global outgroup, we included the allopolyploid relative Brachypodium hybridum (B. hybridum). Previous studies highlighted two genetically-similar families consisting of 12 accessions collected from across southern Australia. HYB1 consisted of accessions initially collected in South Australia and one location in New South Wales. HYB2 consisted of a single geographic location with multiple individuals ( Figure 1C). The selection of germplasm for this study provided a unique, natural system in which to study new heritable variation across a series of genetic families containing minimal intra-family variation within, but with substantial inter-family genetic and phenotypic variation.

Methylome variance reflects genetic distance
To examine the DNA methylation state of accessions here, lowcoverage whole genome bisulfite sequencing was conducted for all 604 experimental plants (Sup Table 2). Samples were correlated over 100 bp genomic tiles to obtain average genome-wide methylation state for all three sequence contexts. CG methylation almost completely reproduced the known genetic relationships -separating B. hybridum and grouping the B. distachyon accessions based on previously determined family clone groups (Figure 2A). Individual biological replicates were often most similar, however, variation observed near terminal edges may be due to the limitations of low coverage sequencing data. This is apparent in the non-CG methylation contexts (CHG and CHH) in which sample relationships showed less complete recapitulation of genetic relatedness ( Figure 2B-C). This was likely due to low coverage preventing accurate measurements for these methylation contexts that occur at lower levels. Therefore we focus the remainder of the analysis on CG sites.
The nested design of this experiment allowed for further quantification of DNA methylation variation at a variety of levels. We identified differentially methylated regions in the CG context (CG-DMRs) using HOME for single-sample DMR calling of pairwise comparisons between: (I) replicates of the same accession (basal level of stochastic differences), (II) accessions within families (intra-family), and (III) between families (inter-family, Table 1, Figure 3). This was performed for the complete dataset (low coverage) and repeated for a subset of samples (family group 1 and 6) sequenced to greater depth (Sup Table  2). Indeed, greater sequencing depth improved the power of HOME to detect CG-DMRs. As comparisons were made between samples with increasing genetic distance, a greater mean number of DMRs were identified (of greater magnitude, length, and CG count). On average, fourfold more CG-DMRs were called between, compared to within, family groups, highlighting previous observations that genetically variable samples contain many more DNA methylation variants. Nonetheless, a substantial number of CG-DMRs were still observed within accessions and family groups, which could contribute to heritable and even adaptive phenotypic variation. This warranted further systematic analysis to quantify the phenotypic variance that can be attributed to SNPs and DMRs.

Dissection of phenotypic variance into genetic and epigenomic components
We used a mixed linear model framework to dissect the phenotypic variance of fitness related traits across multiple levels of biological organization. Our experimental design allowed us to investigate the differences between the species B. distachyon and B. hybridum, and within our B. distachyon samples. We estimated the relative contribution to phenotypic variance of additive genetic effects (as characterized by whole-genome sequence distance among seven clone groups), CG methylation state, which was obtained through low-coverage bisulfite sequencing of 83 accessions, and the residual variance being estimated via the biological replicates of the accessions. In the analyses that considered all hierarchical levels of variation, the species contrast of B. distachyon and B. hybridum explained most of the phenotypic variance in flowering time in spring conditions (84%) where B. distachyon flowered considerably later than B. hybridum. A similar pattern was observed in the fall overwintering conditions; however, this relationship explained less variation overall (20%) as vernalization overwhelms the genetic signal. We also found differences in leaf morphology between the two species, where B. hybridum had both wider and longer leaves compared to B. distachyon. This difference in leaf width and length between the species accounted for 68% and 94% of variance in spring conditions and 76% and 98% in fall conditions for these traits respectively. Within B. distachyon, we observed substantial genetic variance across all traits in both conditions, where heritabilities ranged from 18 to 63% in spring conditions and from 10 to 89% in fall conditions ( Table 2). The addition of the random effect of CG methylation state to the models rarely increased the amount of phenotypic variance explained compared to the simpler model where only additive genetic effects were considered. In this study, only two traits had statistical support that the phenotypic variance explained by methylation patterns was greater than zero and, in both cases, this result was dependent on the conditions in which the traits were measured (Table 2). We estimated that an additional 10% of variation in flowering time is due to methylation in spring conditions (x 2 = 7.29, d.f. = 1, P = 0.007), we also found methylation patterns explained an additional 16% of the variation in plant height measured in fall conditions (x 2 = 13.14, d.f. = 1, P , 0.001). We would like to stress the importance of the joint estimation of genetic and methylation contributions to phenotypic variance as these two explanatory variables were highly correlated, i.e., closely related individuals shared similar methylation patterns. In this study, ignoring the underlying DNA sequence variation in our sample population would have led us to grossly overstate the importance of methylation state on phenotypic variance. For example, we found that methylation state only explained 5% of variation in the tiller count in fall conditions (x 2 = 3.39, d.f. = 1, P = 0.065), while additive genetic effects explained 60% of the variation (Table 2). Re-analyzing the data with the removal of the random effect of additive genetics from the model, the amount of variation explained by the methylation state dramatically increased to 51% (x 2 = 133.33, d.f. = 1, P , 0.001). Overall, we show that in some cases significant phenotypic variation can arise due to changes in DNA methylation that can potentially be inherited by subsequent generations, however, we stress that variation in methylation state is largely dependent on the underlying genetic variation and needs to be analyzed together.

DISCUSSION
Epigenomic diversity continues to be considered as a new source of variation in heritable traits that could be harnessed for plant breeding (Springer and Schmitz 2017). However, genetic and epigenomic polymorphisms are often considered independently making it difficult to determine their relative contribution toward heritable phenotypic variation. Here, utilizing a diverse range of clonal Brachypodium accessions, grown in two distinct controlled environments (simulating local climates of Istanbul spring and fall), we systematically quantified the proportion of phenotypic variance, across numerous adaptive plant traits, attributable to genetic polymorphisms (SNPs), variations in DNA methylation, and the environment. Whereas the majority of phenotypic differences across all traits could be attributed to genetic polymorphisms, CG methylation demonstrated an additive effect in particular environments, such as plant height under longer overwintering (fall) conditions.
Although the known population structure of the selected accessions in this study was recapitulated with methylation data, the level of methylation variation was linked to the overall genetic distance between any given sample(s). The methylation variance captured likely reflects an under-estimation of the true difference, given the limitations of low coverage sequencing. Our DMR calling scans for regions of differential methylation larger than single loci, and are therefore more robust to lower coverage. The key of our experimental design is to have higher biological replicates so any observed differences are reproducible. Overall the results are likely conservative given low coverage per sample, while alternative study designs might identify false positives of biological differences among few samples of higher coverage. Nonetheless, these results are similar to reports across natural Arabidopsis thaliana populations (Schmitz et al. 2013b;Hagmann et al. 2015;Kawakatsu et al. 2016) and a landrace Wheat collection (Gardiner et al. 2018). Methods as described here, can dissect this confounding between genetics and epigenomic components, in particular DNA methylation as studied here, to capture an additional piece of the missing heritability. This highlights the importance of dissecting chromatin variation along with correlated genetic variation to explain phenotypic variation. In the majority of cases where samples are genetically similar, phenotypic variation of quantitative traits is also limited. Despite this, our analyses also revealed that epigenomic distance (CG methylation) could explain additional phenotypic variation of select traits in particular environments.
Going forward, in experimental systems in which genetic sources of variation are not known, it would be advantageous to separate epigenomic (chromatin modifications) from genomic changes (SNPs, SVs) to be able to jointly test effects on a phenotype. Fortunately, advances in genomics, including long reads that also type base modifications, are upon us making this a practical solution (Simpson et al. 2017;Kelleher et al. 2018;Ni et al. 2019). In essence, a joint understanding of genetic and epigenomic relationships could be a new standard for the examination of quantitative trait variation.

ACKNOWLEDGMENTS
The authors would like to thank the Australian Plant Phenomics Facility for providing plant growth facilities and the National n■ Table 2 Sources of phenotypic variation Breakdown of the phenotypic variance from the independent effect of SNPs (i.e., broad-sense heritability), the equivalent effect of CG methylation state, and environment variance. Variance components were estimated using linear mixed models and have been mean-standardized and multiplied by 100 for readability. Bold components were found to be statistically significant at P < 0.05 using log-likelihood ratio tests between full and reduced models