Linear modeling reveals a predominance of cis- over trans- regulatory effects in wild and domesticated barley

Barley, like other crops, has experienced a series of genetic changes that have impacted its architecture and growth habit to suit the needs of humans, termed the domestication syndrome. Domestication also resulted in a concomitant bottleneck that reduced sequence diversity in genes and regulatory regions. Little is known about regulatory changes resulting from domestication in barley. We used RNA-seq to examine allele-specific expression (ASE) in hybrids between wild and domesticated barley. Our results show that most genes have conserved regulation. In contrast to studies of allele specific expression in interspecific hybrids, we find almost a complete absence of trans effects. We also find that cis regulation is largely stable in response to short-term cold stress. Our study has practical implications for crop improvement using wild relatives. Genes regulated in cis are more likely to be expressed in a new genetic background at the same level as in their native background.


Introduction 25
Barley (Hordeum vulgare ssp. vulgare L.) is an important crop for feed, malting and 26 to a lesser extent, human consumption (Ullrich 2010). Among the first crops to be 27 domesticated in the Fertile Crescent about 10,000 years ago (Zohary et al. 2012), 28 barley remains fully interfertile with its wild progenitor H. vulgare ssp. 29 spontaneum K. Koch (H. spontaneum for short). Therefore, H. spontaneum is 30 considered to be a useful source of beneficial alleles for barley improvement. 31 Preferential selection of genotypes with traits beneficial to humans and the 32 intentional breeding have narrowed the genetic diversity and altered gene 33 expression patterns. These molecular changes have caused differences in plant 34 architecture and growth habit between wild and domesticated relatives, 35 collectively called the domestication syndrome (Hammer 1984;Doebley et al. 36 2006). 37 In barley, key domestication and crop evolution genes include Non-brittle rachis 1 38 (btr1) and Non-brittle rachis 2 (btr2) controlling dehiscence of spikelets from the 39 rachis; six-rowed spike 1 (vrs1), which is responsible for lateral floret fertility and 40 may be modified by INTERMEDIUM-C (INT-C); VERNALIZATION1 (Vrn1) which 41 controls the vernalization requirement; covered/naked caryopsis (nud) affecting 42 the adherence of the hull to the caryopsis; and Photoperiod-H1 (Ppd-H1) affecting 43 photoperiod sensitivity (Trevaskis et al. 2003;Turner et al. 2005;Komatsuda et 44 al. 2007;Taketa et al. 2008;Ramsay et al. 2011;Pourkheirandish et al. 2015). 45 These genes were cloned using traditional mapping approaches as their effects 46 are easy to observe given the major phenotypic effect of each gene; however, these 47 tasks were also facilitated by the relative ease for which DNA sequence variation 48 is detected between unrelated genotypes. The task of detecting regulatory 49 variation is more challenging since DNA sequence data alone cannot be used to 50 predict expression. Regulatory variation may arise due to differences in cis or 51 trans factors. Cis factors are physically linked to the genes they control such as 52 promoters or enhancers while trans factors act distally, such as transcription 53 factors. Many studies have been conducted to study the effect of domestication on 54 gene regulation (Rapp et al. 2010;Swanson-Wagner et al. 2012;Koenig et al. 55 2013), although these studies were not designed to disentangle cis and trans 56 effects. 57 In order to achieve separation of cis-and trans-acting factors, Cowles et al. (2002) 58 proposed the comparison of allele-specific expression (ASE) in F1 hybrids to that 59 of the parents. Subsequently, Wittkopp et al. (2004) demonstrated how to find the 60 relative contribution of cis and trans factors. We show this in Supplementary 61 Figure S1 and provide further explanation in the Methods section. Zhang and 62 Borevitz (2009) conducted a similar study using a custom gene expression array 63 with allele-specific probes; however arrays are known to suffer from 64 ascertainment bias (Nielsen 2000). In addition, it can be challenging to design 65 suitable probes that can distinguish between two alleles as demonstrated in yeast 66 by Tirosh et al. (2009). The advent of low-cost RNA-seq enabled the strategy of 67 genome-wide total and ASE to be implemented in a single experiment in 68 Drosophila (McManus et al. 2010). Lemmon et al. (2014) extended this approach 69 to examine regulatory changes between maize and its wild progenitor, teosinte. 70 Cubillos et al. (2014) used the approach to examine the steady-state stress 71 drought response in Arabidopsis. To the best of our knowledge, only one previous 72 study has been published examining ASE in barley (von Korff et al. 2009). In that 73 study, the authors used custom gene expression arrays to measure ASE ratios for 74 30 stress-response genes in five F1 hybrids at different developmental stages. In 75 the present study, we used RNA-seq to estimate the impact of domestication on 76 gene regulation in barley and whether the response to an environmental stress 77 (cold) is affected by domestication. 78

Experimental design 80
The experimental design is summarized in Figure 1 Three cultivars, two landraces and four wild accessions were used in this study 100 for a total of nine parental lines (Table 1). This includes the common maternal 101 reference, Morex (CIho 15773), a six-rowed spring malting cultivar from 102 Minnesota, USA (Rasmusson and Wilcoxson 1979). All other accessions were 103 crossed to Morex, bringing the total number of genotypes to seventeen. Morex was 104 selected because the recently released barley reference genome was generated 105 from bacterial artificial chromosome (BAC) sequences originating from this 106 cultivar (Mascher et al. 2017). The other accessions were selected from an exome 107 capture panel of 267 wild and domesticated barleys in order to maximize 108 geographic and genetic diversity (Russell et al. 2016). Principal component 109 analysis (PCA) of 1.7 million bi-allelic SNPs from these data separate wild samples 110 based geography and domesticated samples based on breeding history as well as 111 row type ( Figure S2). The target space is 60 Mb or about 75% of the barley gene 112 space (Mascher et al. 2013 For those samples with a mapping rate between 50-79%, the source of 124 contamination is the barley stripe mosaic virus (BSMV, Figure S4) while the 125 sample with the lowest mapping rate (30%) is contaminated with human DNA 126 ( Figure S5). BCC131 samples were included in our analyses anyway because the 127 effect of sequence contamination, reduced coverage, merely reduces statistical 128 power for variant calling. While this reduction decreases power for ASE and 129 differential expression analysis, the data for genes that remain informative are 130 still useful. 131

Principal component analysis 132
After checking our gene expression data quality, we examined the data to see if it 133 matches our expectations to ensure that it is reliable. Principal component 134 analysis was conducted using kallisto-derived expression data. The first principal 135 component explains 25% of the variance and separates the parental genotypes 136 from Morex, the common maternal parent for all hybrids. The hybrids cluster 137 between Morex and the parents, as expected for hybrids ( Figure 2A) is from Turkey, and the landrace HOR1969 is from Tibet ( Figure 2B). The third 144 principal component explains 8% of the variance. Samples along this component 145 cluster by accession; however, only HOR1969 loosely clusters separately from the 146 others ( Figure S6). The fourth principal component explains 7% of the variance 147 and separates samples according to treatment ( Figure 2C). The PCA results show 148 that samples cluster according to the principal factors in our experiment (i.e., 149 generation, genotype and treatment). Therefore, the data may be used to 150 confidently determine allele-specific expression. 151 Exome capture and SNP calling 157 The PCA described above was conducted using kallisto, but these results are for 158 overall expression and are not allele-specific. HISAT2 was used for allele-specific 159 mapping of reads. In order to determine which allele a transcript originated from, 160 exome capture data were collected for one individual of each hybrid. Exome 161 capture and (in some cases) whole genome shotgun data already exist for the 162 parents. By comparing SNPs between parental accessions and confirming these 163 SNPs in hybrids between these accessions and transcript (RNA) data, we were 164 able to unambiguously assign transcripts to one allele or the other. This section 165 describes in detail how we found these SNPs. Variant call format (VCF) files 166 resulting from the exome capture analysis pipeline were more rigorously filtered 167 in R. Coverage filters were applied to produce a set of high-confidence SNPs for 168 each cross combination. These filters required a minimum quality score of 5 for 169 both homozygous and heterozygous genotype calls, a minimum read depth of at 170 least 10 for both homozygous and heterozygous genotype calls. The minimum 171 fraction of heterozygous call was set to 1 since we were working with hybrids. The 172 maximum fraction of missing genotype calls was set to 0.9 and the minimum 173 minor allele frequency was 0.05. Genomic Data Structure (GDS) files were 174 produced using the R package SeqArray (Zheng et al. 2017) which contain clear 175 differences between reference and alternate alleles. Exome capture mapping 176 statistics are presented in Figure S7. The number of informative SNPs and genes 177 are presented in Table 2. SNPs are informative if they reside in genic regions since 178 SNPs are only useful for ASE when they are transcribed. SNPs in regulatory 179 regions are important for ASE, but they cannot be detected from RNA-seq data. 180 The number of informative genes for BCC131 (2,589) is lower than expected 181 based on the other landrace, HOR1969 (6,850 genes) as a result of lower coverage 182 due to contamination as discussed above ( Figures S3-S5). Otherwise, the general 183 pattern of wild accessions being more diverged from Morex (8,318 184 informative genes) compared to cultivars (4,296 and 4,634 genes for Barke and 185 Igri, respectively) is, unsurprisingly, observed. 186  191 or not there was allele-specific expression. Initially, we followed the methods used 192 by McManus et al. (2010); however, as we inspected expression plots further, we 193 realized that genes assigned to the trans only category differed greatly in their 194 expression levels between replicates ( Figure 3B). Use of the linear model resulted 195 in a drastic reduction in the number of genes with trans effects including trans 196 only, cis + trans and cis × trans ( Figure 4A, Table 3) compared with the binomial 197 method used by (McManus et al. 2010)( Figure 4B, Table 4). This is in line with 198 what other authors have found in other organisms (Goncalves et al. 2012;Osada 199 et al. 2017). Another notable trend is that the number of genes assigned to the 200 conserved class of regulatory variation is higher when using a linear model. 201 Approximately 80% of the total number of genes were assigned to this class using 202 a linear model versus ~20% using the binomial/Fisher's exact test (Tables 3-4). 203 In addition, regulation of gene expression appears to be stable in response to 204 environmental stress, consistent with the findings of Cubillos et al. (2014). 205 Regulatory category plots for all crosses are given in Figure S8. 206 207 Figure 3. Example profiles for two genes illustrate the effect of the statistical 208 differences between the binomial testing and linear model methods. Both 209 methods agree in A because of the similar expression values between replicates; 210 however, in B the large differences in expression between replicates mean that 211 confidence in the true expression value is low. 212   The numbers of genes in each regulatory category are roughly similar for control samples and those in response to environmental stress, but since these tests were conducted independently, we wanted to know how similar these lists are. To answer this question, we found the intersection of gene lists for each comparison (Table 3-4). The results show that regulatory category assignments are robust to environmental stress, especially for genes with conserved regulation. On average, 94% (~90-96%) of genes in this category are present in both treatments for all crosses. Since it appears that results for category assignments are similar between treatments, we wanted to know what if we could detect more trans effects by considering control and cold treatments together to gain additional replicates, in order to gain statistical power within the linear model. The results (Table 5, Figure  S8) are similar to when each treatment was analyzed separately (Table 3). A moderate increase in the number of trans, cis + trans and cis × trans effects were observed, but not to the same extent as found by (McManus et al. 2010).

Expression of known cold responsive genes
In addition to looking at general expression patterns, we are also interested in the expression of known cold-responsive genes. Therefore, we looked into the expression patterns of these genes including Vernalization1 (VRN1) and Cold-Regulated 14B (COR14B). The expression of both VRN1 (HORVU5Hr1G095630) and COR14B (HORVU2Hr1G099830) matched our expectations. Morex and Barke (spring types) have higher expression levels of VRN1 than Igri (a winter type), both landraces, as well as all wild accessions ( Figure 5A). Expression of VRN1 is maintained at low levels in wild and winter barleys until it has endured a prolonged period of cold exposure, or vernalization. This vernalization requirement is evolutionarily advantageous because flowering will only occur when prevailing environmental conditions are favorable. Hybrids have VRN1 expression levels that match those of the spring types, demonstrating that the loss of a vernalization requirement is dominant. The expression profiles are similar for both control and cold treatments, which is also expected since VRN1 expression only increases after several weeks of exposure to cold temperatures and our samples were only exposed to cold for three hours. The expression of VRN1 in the hybrid confirms that the hybrid shows the correct inheritance patterns. Therefore, we believe that our other results are reliable. The cold-responsive gene COR14B, however, shows a clear increase in response to cold treatment ( Figure 5B). This is also expected, since the plant response to chilling occurs rapidly (Cattivelli and Bartels 1989).

Dominant vs. additive inheritance
In addition to regulatory categories discussed earlier, we are also interested in examining the mode of inheritance of genes in our dataset. Many genes exhibit Mendelian inheritance (dominance vs recessive). However, many other genes exhibit quantitative or additive inheritance. Still other inheritance modes (heterosis) also exist. Heterosis, also known as hybrid vigor, occurs when expression of a gene is outside of the range of the parental values (i.e., overdominance). We were interested in exploring the distribution of these inheritance modes in our data. The summary of the modes of inheritance is reported in Table 6 for control samples and Genes showing dominance together represent about another third of differentially expressed genes. In nearly every cross, more Morex alleles are dominantly expressed than the paternal allele. This could be an effect of Morex being the maternal allele, but it could also reflect a tendency of domesticated alleles to be more highly expressed than wild alleles. The one case where the paternal allele has more dominantly expressed alleles than Morex involved Igri, a winter cultivar, under control conditions. Otherwise, the trend seems to be that the numbers of dominant genes are more equally distributed between the two parents for cultivars (Barke and Igri) and landraces (BCC131 and HOR1969) than for the wild accessions (FT11, FT67, FT279 and FT581). Another quarter to one third (27.4%-36.2% control and 24.5%-34.3% cold) of all differentially expressed genes were placed into the ambiguous category and a handful of others did not fit into any of the other categories. It is difficult to speculate which category these genes truly belong to. We might assume that they would fall into one of the main three categories (Additive, Morex dominant or paternal allele dominant) in a proportional manner, but we cannot state this with certainty.

Discussion
We are interested in understanding the effect of domestication on patterns of gene expression and regulatory variation in barley. To accomplish this, we combined the use of ASE on a small panel of wild and domesticated barleys and their F1 hybrids with a cold stress treatment according to established methods (Cowles et al. 2002;Cubillos et al. 2014;Lemmon et al. 2014). Several lines of evidence indicate that the approach worked and our results are reliable. First, samples cluster according to generation (Figure 2A), accession ( Figure 2B) and treatment ( Figure 2C). The expression profiles of cold-responsive genes such as VRN1 and COR14B also behave as expected ( Figures 5A-5B). Of 39,734 high-confidence genes in the barley genome, we were able to quantify ASE for between 2,589 (BCC131) and 8,940 (FT581) genes (  Lemmon et al. 2014), we expected to find a similar number of genes regulated in cis and trans; however, we found almost a complete absence of genes regulated in trans. The increased expression of cold response genes (COR14B, Figure 5B) after cold treatment suggest that the cold treatment induced transcription factors (TFs) to elicit a response to cold. Since TFs act in trans, some trans effects are expected; however, a small number of TFs may be more plausible than hundreds or thousands of trans-acting genes observed in earlier studies, to minimize pleiotropic effects (West et al. 2007). In general, genes with trans effects may not cause pleiotropic effects if they do not disrupt highly connected nodes in a network (Jeong et al. 2001;Fraser et al. 2002). Further, TFs do not necessarily cause large pleiotropic effects. Work in C. elegans shows that mutations in the Ras signaling pathway that activate multiple TFs are more deleterious than mutations affecting only TFs (Kayne and Sternberg 1995). In our present study, the genes regulated in trans according to the linear model do not appear to have any great significance. The expression levels of these genes are low and are plagued with missing data (e.g., some of the genes are expressed in one genotype, but not another) and annotations are ambiguous. It is also possible that the parameters of our analysis are too strict, resulting in false negatives; however, other studies have likely suffered from false positives. Clearly, a method is needed that rejects trans effects that are truly absent, but accepts real trans effects.
Evidence for regulatory changes in response to environmental stress is absent from our data, in agreement with Cubillos et al. (2014). However, we cannot rule out that the use of a different environmental stress (high temperature, drought or salinity) could induce a more variable response. Cubillos et al. (2014) also found that roughly half of the genes in their samples had compensatory effects, meaning that cis and trans effects have opposite effects. In contrast, in we found that half of our genes had conserved effects. In addition, Cubillos et al. (2014) observed an increase in the number of genes with trans effects that resulted in a change in direction in response to the environment, rather than a change in magnitude, compared to genes with cis effects. We were not able to make such a comparison, since genes with trans effects are virtually absent in our dataset. Wittkopp et al. (2008) found a greater amount of cis regulatory expression differences between species rather than within species, which could also explain why trans effects were more pronounced in studies that examined expression differences between Drosophila species (McManus et al. 2010). However, Osada et al. (2017) also noted large variances in their samples; therefore, our hypothesis that differences observed for trans regulation are likely to be false positives as a result of statistical artifacts seems to be plausible.
The observation of a greater number of cis-compared with trans-acting factors has important implications for the use of crop wild relatives in plant breeding. Insights into gene regulation in barley such as this will help to exploit wild genetic resources in elite germplasm (Schmalenbach et al. 2009). In nature, it appears that cis effects preferentially accumulate, likely due to fewer pleiotropic effects compared with trans effects (Prud'homme et al. 2007). Similarly, in plant breeding, genetic background is known to influence the expression of genes due to epistatic interactions (Kroymann and Mitchell-Olds 2005;Blanc et al. 2006). For novel quantitative trait loci (QTL) introgressed into elite germplasm to be useful, the beneficial trait must be expressed in the elite background. Genes regulated in cis will be more likely to be expressed at the same level in a novel background as in their native background when their regulatory sequence is co-inherited due to linkage, whereas co-inheritance of trans regulators will occur less frequently due to independent segregation. Introgression of a gene as well as its trans regulator would be complicated enough, but could also have deleterious effects in the new genetic background if the trans regulator epistatically affects the expression of offtarget genes. The recipient background may also regulate the introgression through trans regulators. One way to study this experimentally is to use nearisogenic lines (NILs) that contain as many of the total possible genes in small introgressions throughout the genome. Guerrero et al. (2016) conducted such an experiment in tomato. They showed that introgressed genes tend to be down regulated while native (non-introgressed) regions tend to be up regulated. The authors concluded that cis-and trans-regulation have roughly equal contributions to expression divergence.
The cis regulatory regions of genes can be large, extending for thousands of kilobases such as the case with Teosinte Branched 1 (tb1) in maize, which has at least one regulator from 58-69 kb upstream from the 5' start site (Clark et al. 2006). Therefore, it is possible that recombination may occur between a cisregulatory sequence and the gene it controls. However, cis regulatory regions are not well defined. This possibility highlights one limitation of the applications of our study. Due to our experimental design, we can only infer the presence and relative contribution of cis-or trans-acting regulation, but we cannot map these regulators; therefore, we do not know the genomic position of these regulators. An experimental approach known as expression quantitative trait loci (eQTL) mapping allows gene expression to be mapped as quantitative traits in experimental populations or by association genetics. These studies allow for mapping of regulatory elements; however, it is still not always clear at what distance threshold an eQTL would be acting in cis or in trans, since these distance thresholds are often arbitrary (Lagarrigue et al. 2013). In addition, eQTL are more properly referred to as local or distant, rather than cis or trans (Lagarrigue et al. 2013). These studies are also more difficult and expensive because they require a large mapping population to be both genotyped and assayed for genome-wide expression values.
Alignment bias due to polymorphism or structural variants is a well-documented problem with ASE studies (Degner et al. 2009;Stevenson et al. 2013) and our dataset is no exception due to the use of a single genotype (Morex) as a reference.
In part due to these limitations, a single reference genotype is no longer considered to be sufficient to capture the full diversity present in a given species. The concept of the pan-genome posits that any species has a set of genes present in all accessions (the core genome), genes that are present in some, but not all accessions (the dispensable genome) and lineage-specific genes that are only present in a single accession. In this context, additional reference genomes are needed. Other barley genotypes, such as Barke and FT11 are not available at present. There is a barley pan-genomic project underway, at which point these genotypes and others will be available. Please see Monat et al. (2018) for a summary of this topic. For now, it is necessary to interpret our results with caution. When the genomes of these other accessions do become available, it will become possible to re-analyze these data to measure the impact of the reference bias.
The availability of additional reference genomes will also allow for re-analysis of these data with a Bayesian approach that allows for direct comparison of environmental effects (León-Novelo et al. 2018). Additional reference genomes are necessary because the method incorporates the number of RNA reads which align equally well to both parental genomes.

Growth conditions
Plants were grown in a growth chamber with a 12 h photoperiod with temperatures of 22°C and 18°C during light and dark periods, respectively. After one week of growth, when the first leaf of each accession was fully expanded, half of the plants were moved to a cold room at 4°C for 3 h. The response to chilling occurs rapidly in barley (Cattivelli and Bartels 1989), so this short cold treatment is sufficient to induce a physiological response. After the 3 h cold treatment, the first leaf of each individual from both groups was harvested and immediately frozen in liquid nitrogen before being moved to storage at -80°C. Each cold treatment (11:00) and tissue harvest (14:00) was conducted at the same time of the day for each replicate to avoid confounding factors associated with circadian rhythm. The experimental design is shown in Figure 1. The experiment was replicated four times. For accessions that either failed to germinate or grew poorly, a fifth attempt was made to obtain additional replicates. As a result, most samples were replicated four times. A few samples have only three replicates: FT67 hybrid cold, FT581 parent control, both FT581 hybrid control and cold, and Morex parent control. Two samples have only two replicates: Barke hybrid control and Igri hybrid cold. One sample, Barke hybrid cold, was not able to be replicated despite repeated efforts to get more data.

RNA extraction, sequencing and data analysis
Frozen leaf tissue (-80°C) was homogenized by grinding to a fine powder in 1.5 ml tubes with metal beads two times for 30 s each (1 min total) at 30 Hz using a mixer mill (Retsch GmbH, Haan, Germany). Tubes containing the samples were submerged in liquid nitrogen between grinding to ensure samples did not thaw during the process. Once all samples were ground, RNA was extracted using RNeasy ® mini kits (Qiagen) according to manufacturer's instructions. To remove any DNA contamination, samples were treated with Ambion TM DNase (ThermoFisher Scientific) according to manufacturer's instructions. RNA quality and integrity were checked with an Agilent 2100 Bioanalyzer (Agilent Technologies) and a Qubit TM 2.0 fluorometer (ThermoFisher Scientific), respectively.
Where possible, three individuals of each parent or hybrid were planted for each replicated treatment. The healthiest plant (e.g., not yellow or stunted) was selected for harvesting. After RNA extraction was carried out according to the methods described above, high quality RNA (mass ≥ 1 μg, volume ≥ 20 μL, concentration ≥ 50 ng/μL, RIN ≥ 6.3, and 260/280 and 260/230 ≥ 2.0) samples were submitted for sequencing.
In total, 123 NEB Next ® Ultra TM RNA libraries with an average insert size of 250-300 bp were sequenced (paired-end, 2× 150 cycles) on an Illumina HiSeq 2500 machine. RNA sequencing was done by Novogene while exome capture sequencing was performed at the IPK sequencing center. RNA-seq data were quantified using both the pseudoalignment software kallisto v. 0.43.0 (Bray et al. 2016) and HISAT2 (Kim et al. 2015). The abundance files from kallisto and HISAT2 were separately loaded into the R statistical environment (R Core Team 2012) for further analysis. Gene abundance estimates from kallisto were normalized using edgeR and limma (Robinson et al. 2010;Ritchie et al. 2015) and the voom transformation (Law et al. 2014) was applied to account for the mean-variance relationship of RNA-seq data. These data were used to calculate the variance using the matrixStats package (Bengtsson 2016). The 1000 genes with the highest variance were used for PCA. Kallisto was used to find overall expression patterns while HISAT2 was used for allele-specific expression. All raw RNA sequence data are available from the European Nucleotide Archive (ENA) under accession numbers PRJEB29972. Accession numbers for individual samples are provided in Table S1.

DNA extraction and exome capture
In order to select high-confidence variants for allele-specific expression analysis using a genomic control, an exome capture assay was applied for the eight hybrid genotypes (Mascher et al. 2013). Exome capture data for the parental genotypes may be found in Russell et al. (2016). Genotype matricies for SNPs (https://doi.org/10.5447/IPK/2016/4) and indels (https://doi.org/10.5447/IPK/2016/5) for parental accessions are available through e!DAL (Arend et al. 2014). The raw sequence data for these parents were deposited into the ENA and the accession codes are available in Supplementary  Table 1 of Russell et al. (2016). For the present study, hybrid DNA was extracted using a DNeasy ® kit (Qiagen). DNA concentrations were measured using a Qubit TM 2.0 fluorometer (ThermoFisher Scientific) and all samples were above 20 ng μl -1 . DNA integrity was verified using a 0.7% agarose gel, which showed that DNA from each sample was intact. Sequencing was performed using an Illumina Hiseq 2500 machine (2 x 100 bp, insert size = 320 bp). Captured reads were mapped against the BAC-based Morex reference sequence (Mascher et al. 2017) with BWA-MEM (Li et al. 2013). Coverage was determined using the depth command from SAMtools (Li 2011) using only properly paired reads. Mapping statistics are available in Figure S3.

Allele-specific transcript quantification and normalization
The R package limma (Ritchie et al. 2015) was used for the analysis of ASE using a linear model approach. Briefly, allele-specific counts were converted into a matrix and rounded to the nearest integer. Counts were then normalized using edgeR (Robinson et al. 2010) to account for differences in total read count between samples and stored in a differential expression list. A design matrix was created using each combination of generation × accession × treatment as a single factor. The voom transformation was applied to the count matrix to account for the mean-variance relationship of RNA-seq data. The linear model was created by fitting the voom-transformed (Law et al. 2014) count matrix to the design matrix. Differentially expressed genes were identified using the contrasts specified in the contrast matrix. For example, the expression level of each individual parent was contrasted to Morex to decide whether the parents were different from each other. Subsequently, the parental alleles within the hybrid were compared to each other to decide if their expression was different from each other.
These allele-specific counts were also used as input for differential expression analysis. We used the differential expression analysis to assign the mode of inheritance for differentially expressed genes, which is described below.
Allele-specific counts were derived from SNPs in the RNA-seq data that were corroborated by a genomic control. First, informative SNPs were detected in the exome capture data. SNPs were considered informative in a specific cross if they the parents carried different alleles in homozygous state. In addition, the successful genotype calls in the hybrid exome capture data were required. Then, we determined how many reads supported the reference allele or the alternate allele in RNAseq data for parents hybrid and calculate the DV/DP ratio (depth of the variant allele vs. total read depth). Information for multiple SNPs were combined at the gene level by merging the SNP information with gene information in the R statistical environment and summing up DP and DV values for all SNPs in a gene. Low DV/DP ratios indicate that more reads originated from the reference (maternal = Morex) allele while a high DV/DP ratio indicate more reads originated from the alternate (paternal) allele. A DV/DP ratio of 0.5 means that both alleles are expressed equally. Genes with less than 50 reads across all samples were filtered out before further analysis. The design matrix was created by considering each combination of accession, generation and treatment as a single factor. The linear model created by fitting the model specified in the design matrix to the voom-transformed (Law et al. 2014) count matrix. Genes may be assigned to one of seven regulatory categories described by McManus et al. (2010). Genes with significant (FDR adjusted p-value ≤ 0.01 using Benjamini-Hochberg procedure) expression differences between parents and parental allele expression levels matching that of their respective parent in the hybrid were assigned to the cis only category ( Figure S1A). In contrast, genes with significant expression differences between parents, but not between parental alleles in the hybrid were assigned to the trans only category ( Figure S1B). Figures S1C and S1D show the expectations for cis + trans and cis × trans categories, respectively. Full descriptions of regulatory categories may be found in McManus et al. (2010).

Dominant vs. additive inheritance
We used our gene expression dataset to find whether genes were inherited in a dominant or an additive manner. We use the classifications given by Albert et al. (2018) to make assignments. First, we used the subset of differentially expressed genes from each cross as described above. Genes were assigned as Morex dominant if the expression of the gene in the hybrid was greater than in the low parent and matching the expression of Morex. Genes were called recessive when the expression in the hybrid was lower than Morex and matched that of the low parent. We renamed these as "paternal allele dominant" in the final tables. Additive genes were those genes which had intermediate expression values between the two parental alleles. Genes which had higher expression values than both parents and Morex was the high parent were placed into the Morex overdominant category. Genes which had higher expression values than both parents and the paternal parent was the high parent were assigned to the "paternal allele overdominant" category. For the genes which remained unclassified, we used log2 FC expression values below 1 and greater than -1 for each contrast to assign these genes to the "ambiguous" classification. Even after this step, some genes remained unassigned. We report these genes as "not assigned". The number of genes in each category was small, but for two crosses (Barke and FT67) the number of unassigned genes was relatively high at 174 and 101, respectively. support from the German Research foundation (Deutsche Forschungsgemeinschaft, DFG) to Martin Mascher (Grant ID: MA6611/2).

Conflict of Interest
The authors declare no conflict of interest. Figure S1. Expected relative expression levels for A) Cis only effects, B) Trans only effects, C) Cis + Trans and D) Cis × Trans. Bar plots for the father (blue) and mother (yellow) are the result of the combined effects of both alleles in the respective accession. Two green bars (middle) each represent a single allele in the hybrid individual. F1 F is the hybrid allele derived from the father while F1 M is the hybrid allele derived from the mother. Figure S2. Geographical distribution of wild barleys used in this study (except FT279 from Afghanistan, which is not in the frame); and Principal component analysis based on exome capture data from (Russell et al. 2016) that was the basis of selection of parents for use in this study.     Figure  2C except that HOR1969 samples are colored in red and all other samples are colored in black. Sample shapes designate generation. Circles are parental samples while hybrids are triangles. Morex is indicated with a "×" symbol. Figure S7. Exome capture mapping statistics for the eight hybrids used in this study. Figure S8. Log2 ratio plots of parents (x-axis) vs. parental alleles in the hybrid (yaxis) for all crosses when treatments were not considered separately and instead grouped as additional replicates.