Weaker selection on genes with treatment-specific expression consistent with a limit on plasticity evolution in Arabidopsis thaliana

Abstract Differential gene expression between environments often underlies phenotypic plasticity. However, environment-specific expression patterns are hypothesized to relax selection on genes, and thus limit plasticity evolution. We collated over 27 terabases of RNA-sequencing data on Arabidopsis thaliana from over 300 peer-reviewed studies and 200 treatment conditions to investigate this hypothesis. Consistent with relaxed selection, genes with more treatment-specific expression have higher levels of nucleotide diversity and divergence at nonsynonymous sites but lack stronger signals of positive selection. This result persisted even after controlling for expression level, gene length, GC content, the tissue specificity of expression, and technical variation between studies. Overall, our investigation supports the existence of a hypothesized trade-off between the environment specificity of a gene’s expression and the strength of selection on said gene in A. thaliana. Future studies should leverage multiple genome-scale datasets to tease apart the contributions of many variables in limiting plasticity evolution.


Introduction
Organisms must cope with ever-changing environmental conditions to survive and reproduce.If these changes in condition cannot be avoided or escaped, phenotypes that respond to environmental variation through phenotypic plasticity may be adaptive.For example, under low light, the same Arabidopsis thaliana genotype will produce more or larger leaves to capture more energy for photosynthesis (Pigliucci and Kolodynska 2002).Plastic responses are partly controlled through differential gene expression between environments (Scheiner 1993;Schlichting and Smith 2002).Understanding the evolution of these conditionspecific expression patterns could help reconcile the diversity of plastic responses observed in nature and engineer organisms to overcome environmental challenges.
However, not all organisms can respond plastically to environmental change, so it is crucial to understand the processes that constrain plasticity (Van Kleunen and Fischer 2005).These constraints are usually characterized as either costs, where plasticity reduces fitness in some way, or limits to the evolution or maintenance of plasticity (DeWitt et al. 1998).Decades of research has attempted to measure the costs associated with plasticity (reviewed in Schneider 2022) but studies often fail to detect costs or find costs that are weak or restricted to certain environments (Van Kleunen and Fischer 2005;Van Buskirk and Steiner 2009;Auld et al. 2010).Theory also predicts that there will be strong selection to alleviate costs (Murren et al. 2015).Thus, limits may be more important than costs in shaping the evolution of plasticity.
Recent work suggests that relaxed selection can limit plasticity evolution (Snell-Rood et al. 2010;Murren et al. 2015).For instance, one hypothesis posits that genes are often under selection for environment-specific expression to minimize deleterious pleiotropy (Snell-Rood et al. 2010;McGuigan et al. 2014;Huber et al. 2017).However, narrowing the range of environments where a gene is expressed also reduces the opportunity for negative selection to act on deleterious mutations in the gene (Kawecki 1994;Whitlock 1996; Van Dyken and Wade 2010).The accumulation of deleterious mutations could then cancel out any selective benefits of the environment-specific expression pattern.Thus, a trade-off arises between a gene's degree of environment-specific expression and the strength of negative selection acting on said gene.If we assume that environment-specific expression generally contributes to phenotypic plasticity, then this trade-off would potentially limit the maintenance of plasticity (Kawecki 1994;Snell-Rood et al. 2010).Whether such a trade-off exists has not yet been tested, but the deposition of expression data from hundreds of experimental treatments across hundreds of labs into public repositories now enables approximating environment specificity as treatment specificity and linking treatment-specific expression to the rate of evolution.
To investigate the influence of treatment-specific expression on evolutionary rates, we compiled a dataset of gene expression data across over 200 treatments from over 300 peer-reviewed studies in A. thaliana.We annotated RNA-sequencing runs from these studies using standardized ontologies, then processed all of them with the same pipeline.Finally, we combined the resulting gene expression matrix with estimates of selection based on within-species polymorphism and between-species divergence to investigate whether genes with treatment-specific expression were under weaker negative selection.

RNA-seq run annotation
We amassed an initial set of RNA-seq runs from the Sustech Arabidopsis RNA-seq database V2 (Zhang et al. 2020) (http://ipf.sustech.edu.cn/pub/athrdb/)excluding any samples not associated with a publication or lacking a tissue type label.On 2022 May 24, we also downloaded all run metadata from the Sequence Read Archive (SRA) returned by the following search term: ("Arabidopsis thaliana"[Organism] AND "RNA"[Source]) OR ("Arabidopsis thaliana"[Organism] AND "RNA-Seq"[Strategy]) OR ("Arabidopsis thaliana"[Organism] AND "TRANSCRIPTOMIC" [Source]).All SRA runs were linked to their associated publications, if possible, using Entrez.Any SRA run numbers that we could not link to a PUBMED ID or DOI were omitted.We then manually removed all SRA runs that originated from transgenic, mutant, hybrid, grafted, cell culture, polyploid, or aneuploid samples based on information in the SRA metadata and associated publications.Runs from any naturally-occurring A. thaliana accession were included.We also omitted SRA runs that focused on sequencing non-coding RNA (ncRNA-seq, miRNA-seq, lncRNA-seq, sRNA-seq, etc.).After applying these criteria, any bioprojects with 8 or fewer SRA run numbers remaining were also omitted.
All runs were labeled with treatment and tissue-type descriptions using the Plant Experimental Conditions Ontology (PECO) and the Plant Ontology (PO) (Cooper et al. 2018), respectively, based on information in their associated publications and SRA metadata.In our analysis, control exposure was defined as long-day conditions (12 h light exposure or longer, but not constant light) and growing temperatures in the range of 18 • -26 • , inclusive, without explicit application of stress or nutrient limitation.Warm treatments were defined as 27 • or higher, while cold treatments were defined as 17 • or lower.Any studies that did not report both day length and growing temperature were omitted.Any runs that could not be linked to treatments based on their annotations in the SRA or Sustech databases were also omitted.Treatment with polyethylene glycol (PEG) was categorized as drought exposure.Samples from plants that were recovering from stress were categorized according to the growth conditions of the recovery state instead of the stressed state.When appropriate, we labeled samples with multiple PECO terms.For example, a sample that was subjected to both heat stress and high light stress would get two PECO terms (one for each stress) and be treated separately from samples subjected to only heat stress or only light stress.Tissue-type labels were eventually collapsed to the following categories: whole plant, shoot, root, leaf, seed, and a combined category of flower and fruit tissues.The flower and fruit tissue categories were combined because of their developmental relationship and small size relative to the other categories.In the end, we had a dataset of 24,101 sequencing runs from 306 published studies.

RNA-seq run processing
All RNA-seq runs were processed using the same workflow to remove the effects of bioinformatic processing differences between studies on expression level.First, runs were downloaded using the SRA toolkit (v2.10.7), but 90 runs were not publicly available and thus failed to download.All successfully downloaded runs were trimmed using fastp v0.23.1 (Chen et al. 2018), requiring a minimum quality score of 20 and a minimum read length of at least 25 bp (-q 20 -l 25).Trimming results were compiled using multiqc v1.7 (Ewels et al. 2016).All trimmed runs were then aligned to a decoy-aware transcriptome index made by combining the primary transcripts of the Araport11 genome annotation (Cheng et al. 2017) with the A. thaliana genome in salmon v1.2.1 (Patro et al. 2017) using an index size of 25 bp.The salmon outputs of each run were then combined with a custom R script to create an gene-by-run expression matrix.We omitted 423 runs with a mapping rate <1%, 215 runs with zero mapped transcripts, and 18 genes with zero mapped transcripts across all runs from further analysis.We note that although this cut-off does not exclude samples with more modest mapping rates (e.g.20-60%) the choice to include these samples was to avoid removing large chunks of data as "outliers" and analyzing only those samples that conform to our expectations.

Whole-genome sequence data processing
We downloaded whole-genome sequencing data for 1,135 A. thaliana accessions from the 1,001 genomes project panel (SRA project SRP056687) (Alonso-Blanco et al. 2016) using the SRA toolkit.All runs were trimmed using fastp (Chen et al. 2018), requiring a minimum quality score of 20 and a read length of at least 30 bp (-q 20 -l 30).Trimmed reads were then aligned to the A. thaliana reference genome using BWA v0.7.17 (Li and Durbin 2009).The alignments were sorted and converted to BAM format with SAMTOOLS v1.11 (Danecek et al. 2021), then optical duplicates were marked with picardtools v2.22.1.Haplotypes were called for each accession, then combined and jointly genotyped with GATK v4.1.4.1 assuming a sample ploidy of 2, heterozygosity of 0.001, indel-heterozyogsity of 0.001, and minimum base quality score of 20.Invariant sites were included in the genotype calls with the-include-non-variant-sites option.All calls were restricted to only coding sequence (CDS) regions based on the Araport11 annotation by supplying a BED file of CDS coordinates made with bedtools (v2.29.2).Following Korunes and Samuk (2021), variant and invariant sites were filtered separately using both GATK and vcftools v0.1.15(Danecek et al. 2011).Variant sites were filtered if they met any of the following criteria: QD < 2, QUAL < 30, MQ < 40, FS > 60, HaplotypeScore > 13, MQRankSum < −12.5, ReadPosRankSum < −8.0, mean depth < 10, mean depth > 75, missing genotype calls > 20%, being an indel, or having more than two alleles.In the end, 1,915,859 variant sites across all coding sequences were retained for further analysis.Invariant sites were filtered if they met any of the following criteria: QUAL > 100, mean depth < 10, mean depth > 75, missing genotype calls > 20%.Finally, variant sites were annotated using snpEff (Java v15.0.2) (Cingolani et al. 2012b) and variants labeled as either missense or synonymous were separated into different files using SnpSift (Cingolani et al. 2012a).

Selection estimated from between-species divergence
We identified 1:1 orthologs between the primary transcripts of A. thaliana and Arabidopsis lyrata with Orthofinder v2.5.4 (Emms and Kelly 2019).For each 1:1 ortholog, we aligned their protein sequences with MAFFT L-INS-I v7.475 (Katoh and Standley 2013), then converted the protein alignments to gapless codon-based alignments using pal2nal v14 (Suyama et al. 2006).Using the gapless codon-based alignments, we estimated dN/dS using the Nei and Gojobori (1986) method implemented as a custom Biopython v1.79 script and implemented through the codeml program in the PAML package v4.9 (Yang 2007).Unlike codeml, the custom Biopython script also returns counts of nonsynonymous (N) and synonymous sites (S) within each gene as described in Nei and Gojobori (1986), which we later used to calculate nucleotide diversity per nonsynonymous site (π N ) and per synonymous site (π S ).Before proceeding with more analyses, we confirmed that our estimates of dN and dS were consistent between our Biopython script and codeml (Supplementary Fig. S5, Pearson correlations dN : ρ = 0.9998, dS : ρ = 0.9809).The outputs of the Biopython script were used in all subsequent analyses.

Nucleotide diversity at nonsynonymous sites
Nucleotide diversity (π) was calculated for each gene with pixy v1.2.3.beta1(Korunes and Samuk 2021) three times: once using all sites (both variant and invariant), once using missense sites plus invariant sites, and once using synonymous sites plus invariant sites.These estimates were then converted to π, π N , and π S , respectively, by first multiplying the per site estimate output from pixy by the number of sites included in the analysis.Then, to get π N and π S , the values from analyses of missense plus invariant, and synonymous plus invariant sites were divided by the N and S values for each gene, respectively, as determined by the method in Nei and Gojobori (1986).

Tajima's D
We next calculated Tajima's D for each gene.First, we calculated π and Watterson's Theta (θ W ) for each variant site i within a gene (π i and θ Wi , respectively).In this case, π i was calculated as: Where n i is the number of sequenced chromosomes with nonmissing genotypes for variant i, p i1 is the frequency of the reference allele, and p i2 is the frequency of the alternative allele.Then, θ Wi was calculated as: Where a i is: This calculation of θ Wi is equivalent to the usual calculation of θ W with the number of segregating sites set to one.Next, the variance in Tajima's D was calculated for each site as: This is equivalent to equation 38 in Tajima (1989) with the number of segregating sites set to one.Finally, the results of the above calculations were combined in the following formula: To get Tajima's D for each gene, we then averaged across the D i values for all the variant sites within a gene.

Direction of selection (DoS)
Counts of nonsynonymous and synonymous polymorphisms within each gene (P N and P S , respectively) were determined with bedtools (v2.29.2).The number of nonsynonymous and synonymous differences (D N and D S , respectively) between A. thaliana genes and their 1:1 A. lyrata orthologs, if present, were estimated during the process of calculating dN/dS in Biopython as described above.These values were then used to calculate the direction of selection (DoS) (Stoletzki and Eyre-Walker 2011) as follows: We chose this metric, as opposed to the proportion of amino acid substitutions driven by positive selection (α), because it is less biased than α (Stoletzki and Eyre-Walker 2011) and was successfully used in studies similar to ours (Paape et al. 2013).Furthermore, we found that α often returns uninterpretable negative values when applied to A. thaliana, perhaps because of an excess of slightly deleterious polymorphisms (Nordborg et al. 2005) due to their predominantly selfing mating system (Charlesworth 1994).

Treatment specificity
Treatment specificity (τ) was estimated separately for runs from each tissue type using the following formula (Yanai et al. 2005): Where x is the vector of average expression values of a gene in each treatment category, measured in transcripts per million (TPM), and where N is the number of treatment categories.
Dividing by N-1 means that τ varies between zero and one, where zero indicates no specificity and one indicates exclusive specificity to a single treatment.We used this metric of specificity because it is consistently more robust than others (Kryuchkova-Mostacci and Robinson-Rechavi 2017) and is normalized by the number of treatments included, making it comparable across datasets.We also applied the same formula to calculate tissue specificity in several different treatment conditions.

Simulating correlations between average expression and specificity index
Average expression level and measures of expression specificity are correlated by definition because genes with more treatment/ tissue-specific expression will have lower average expression across all treatment/tissue categories.We ran two simulations to better illustrate the factors driving the correlation between average expression and the specificity index, τ.In both simulations, we generated 1,000 random matrices, where each element x ij represented the expression of gene i in experiment j, by sampling from a zero-inflated negative binomial distribution: Where the size and probability parameters of the negative binomial component were N = 100 and p 1 = 0.1, respectively, while the probability of an expression value being non-zero was p 2 = 0.4.All matrices included five groups of columns, with five columns per group, representing replicates of tissue/treatment groups.For both simulations, we averaged across columns within each group to simulate the calculation of tissue/treatment-wide averages.We then applied the formula for τ across the rows of this averaged matrix to get expression specificity.In one simulation, we calculated expression level by averaging across the rows of the expression matrix.In a second simulation, we excluded experiments where a gene was not expressed (x ij = 0) from the calculation of average expression.

Average expression, length, GC content, family size
Calculating the average expression of each gene was a three-step process.First, we averaged together runs with matching SRA experiment IDs because these runs represented technical replicates of the same biological sample and treatment conditions.Second, we partitioned our gene-by-experiment expression matrix by the tissue type each sample came from.Finally, for each tissue type's expression matrix, we averaged across all of the expression values of each gene across all experiments, excluding values <5 transcripts per million (TPM).We excluded values <5 TPM from the average expression calculation to avoid a high correlation between average expression and treatment specificity, as has been reported in previous studies (Slotte et al. 2011).This high correlation occurs because an environment-specific gene will by definition also have low average expression across environments it is rarely expressed in.Furthermore, we excluded values <5 TPM to avoid including small expression values that could be artifacts of alignment error.
The length and GC content of each gene was measured using the bedtools nuc command (v2.29.2) and included each gene's introns and untranslated regions when present.We included introns and untranslated regions in the estimate of gene length because they play important roles in determining rates of protein evolution (Castillo- Davis et al. 2002;Eisenberg and Levanon 2003).Finally, the family size for each gene was estimated as the number of A. thaliana genes in their respective orthogroups output by OrthoFinder.

Partial correlation analysis
Not all treatment-tissue combinations were sampled in the overall RNA-seq dataset, causing confounding between the treatment and tissue labels.We resolved this in two ways.First, we subset the data to only the treatment conditions where all tissue types were represented.Second, we subset the data by tissue type and analyzed each subset separately.For each subset, we calculated partial spearman correlations between treatment specificity and our measures of selection (dN, π N , Tajima's D, and DoS) after accounting for average expression (excluding values TPM < 5), gene length, and GC content using the ppcor R package (Kim 2015).For partial correlation analyses involving π N and Tajima's D, we also controlled for gene family size.We did not account for gene family size in partial correlation analyses involving dN or DoS because these metrics apply to only genes with one family member in this study.When calculating partial correlations involving dN, we excluded any genes with saturating divergence (dS > 1).All statistical analyses and data visualizations used R v4.0.3 and used color palettes in the scico R package (Crameri 2018;Pedersen and Crameri 2022).

Surrogate variable analysis
We recalculated treatment specificity and repeated all partial correlation analyses after correcting each data subset for technical between-experiment variation (i.e.batch effects), following an approach from (Fukushima and Pollock 2020).Batch effects include variables that influence gene expression measurements but are not of interest to this study, such as the sequencing platform and the library prep protocol used in each experiment.First, with our data already subset by tissue type, we further subset to only include treatments with RNA-seq runs from at least two studies.This minimizes confounding between-treatment variation with the technical between-experiment variation we aimed to account for.We then applied surrogate variable analysis (SVA) using the svaseq() function within the SVA package (Leek and Storey 2007) to each of these subsets.Briefly, SVA models gene expression as: Where x ij is the expression of gene i in experiment j, μ i is the average expression of gene i across all experiments, and y i is the value of a predictor variable of interest for gene i.Furthermore, f (y i ) gives the deviation of gene i from its average expression based on the value of y i and e ij is the residual error.SVA takes this model and partitions the residual variance, e ij , into: Where  L ℓ=1 γ ℓi g ℓj gives the summed effects of L unmodeled variables (g ℓj ) for each gene and e * ij gives the gene-specific noise in expression.SVA does not attempt to estimate what the unmodeled variables influencing expression are, but rather find a set of vectors (the surrogate variables) that span the same space as g: Where each h k is a surrogate variable and each λ k gives the effects of each surrogate variable on gene expression.For our analyses, our predictor variable y i was treatment type.To get a measure of expression where the effects of surrogate variables are removed, we then subtracted off the effects of surrogate variables from both sides of the above equation.
Where x ij −  K k=1 λ ki h kj gives us our expression values accounting for the effects of surrogate variables.The net result here is a reduction in the amount of unexplained or seemingly stochastic variation in expression because sources of variation have been attributed to "surrogates" that span the same space as real batch variables.We also conducted principal component analysis in R before and after SVA to verify the removal of batch effects.

Summary of tissue differentiation, treatment specificity, and selection in overall dataset
To understand how treatment specificity of gene expression affects evolutionary rates of proteins, we queried the Sequence Read Archive for all A. thaliana RNA-seq experiments published before May 2022.We then annotated these experiments with standardized tissue and treatment ontology terms, manually filtered the dataset, and then processed all RNA-seq runs with a standardized pipeline.The number of sequencing experiments associated with each combination of tissue and treatment labels is summarized in Supplementary Table S1.Overall, the most sampled tissue category was leaf (4,642 experiments) followed by root (3,348 experiments), whole plant (2,492 experiments), seed (1,866 experiments), shoot (1,106 experiments), then fruit and flower (266 experiments).The four most sampled treatment categories were control (5,701 experiments), cold air exposure (675 experiments), short day length (561 experiments), and short day length plus Botrytis cinerea exposure (407 experiments).Any sequencing runs that shared an SRA experiment ID were averaged to produce individual gene expression values for each SRA experiment.
We first looked at the distribution of mapping rates across all RNA-seq runs.The median mapping rate was 72.39% (Supplementary Fig. S1) and we excluded runs with a mapping rate <1% from further analyses.We next performed a principal components analysis (PCA) on the expression matrix and observed strong differentiation between root and non-root tissues along PC2 (Fig. 1).We also observed that nearly all genes had some degree of treatment specificity in their expression (Fig. 2a, Supplementary Fig. S3).Furthermore, only a small proportion of genes had strong signatures of selection based on dN/dS, π N /π S , DoS, or Tajima's D (Fig. 2, b to d, Supplementary Fig. S2).The treatment specificity of expression was lower on average in flower and fruit tissue compared with the other tissues (Supplementary Fig. S3).However, tissue specificity did not vary widely depending on the treatment condition (Supplementary Fig. S4).

Omitting samples with low expression disentangles expression level and specificity
Genes that are only expressed in one treatment or tissue will, by definition, have low mean expression across all environments or tissues (Wright et al. 2004).Thus, we sought a method of calculating expression level that was independent of treatment specificity.To better understand the relationship between average expression and treatment specificity, we calculated correlations between treatment specificity and expression level while either including or excluding low expression values (TPM <5) on our real RNA-seq dataset.We found that excluding low expression values decreased the correlation between average expression and treatment specificity in leaf tissue samples (Fig. 3) and other tissues (Supplementary Figs.S34-S38) and replicated the result by simulating gene expression matrices (Supplementary Fig. S39).Thus, for all later partial correlation analyses (see next section) we quantified each gene's average expression after dropping experiments where the gene was not expressed (TPM < 5).

Treatment specificity correlates with levels of nonsynonymous diversity and divergence in genes
We next calculated partial correlations between treatment specificity and measures of selection after controlling for average expression, gene length, GC content, and tissue specificity in expression.These partial correlations were calculated separately for  4).Gene family size had among the weakest partial correlations with π N compared to other covariates, but strongly correlated with treatment specificity (ρ = 0.12, P − value = 6.3 × 10 −84 , Fig. 4b).All of these findings generally held when average expression and treatment specificity were calculated on data from other tissues (Supplementary Table S2, .

Correlations between treatment specificity and nonsynonymous variation persist after controlling for batch effects and dataset imbalance
While combining gene expression data across multiple studies can increase the statistical power of an analysis, there are some potential concerns.First, if many tissue-treatment combinations are not sampled, the dataset will be unbalanced and the effects of tissue and treatment variation on expression could be confounded.Consistent with this expectation, there was a high correlation between tissue specificity and treatment specificity in our initial analyses (Fig. 4, Supplementary Figs.S6-S10).Furthermore, combining data from multiple laboratories could generate batch effects (Leek et al. 2010).To address the issues of imbalance and batch effects, we first subset our data to only include treatments where all tissue types were represented.This subset included the treatments of control, abscisic acid, continuous light, warm/hot air temperature, and cold air temperature.We then used SVA to correct for the influence of unknown batch effects on these data subset (Leek and Storey 2007).After SVA, treatment specificity positively correlated with dN (ρ = 0.10, P − value = 1.6 × 10 −32 ) and π N (ρ = 0.07, P − value = 1.5 × 10 −23 ) when average expression and treatment specificity were calculated on combined fruit and flower data (Supplementary Fig. S33).However, treatment specificity in other tissue types generally did not correlate with our measures of selection (Supplementary Figs.S28-S33, Table S4).
The inclusion of only five treatments in the above analysis could limit quantification of a gene's treatment specificity.Thus, in order to include data from a larger number of treatments, avoid dataset imbalance, and avoid batch effects, we split our expression matrix into six subsets by tissue category.We then further removed treatments that only had expression data from one study to avoid confounding treatment effects with study-specific batch effects.We applied SVA (Leek and Storey 2007) to each of these tissue-specific subsets.After SVA, the expression profiles of most genes appear less treatment-specific (Supplementary Figs.S16-S21 panels a vs b).We also observed less separation in PCA space within treatment groups after SVA (for example, see Supplementary Fig. S16, c and d).Average expression levels before SVA were generally correlated with expression levels after SVA (Supplementary Figs.S16-S21 panels a and b).In partial correlations for theSVA-corrected leaf tissue subset, treatment specificity significantly correlated with dN (ρ = 0.13, P − value = 6.9 × 10 −50 ) and π N (ρ = 0.16, P − value = 3.9 × 10 −128 ) but less strongly correlated with Tajima's D (ρ = 0.04, P − value = 6.6 × 10 −10 ) and DoS (ρ = 0.05, P − value = 2.0 × 10 −8 ) (Table 1, Fig. 5).These patterns were similar in other tissue types (Supplementary Figs.S11-S15, Table S3).

Discussion
Our main finding is that genes with more treatment-specific expression patterns are, on average, under weaker selective constraint in A. thaliana.This is evident by treatment-specific genes generally having higher values of π N and dN, but not higher values of Tajima's D and DoS, compared to genes with more constitutive expression (Figs. 4 and 5).Our result does not refute the possibility of strong positive selection on treatment-specific genes, as is the case for nucleotide binding site leucine-rich repeat proteins (NBS-LRRs) in A. thaliana (Mondragón-Palomino et al. 2002).Rather, treatment-specific genes are simply under weaker selection on average compared with less treatment-specific genes.Altogether, this pattern is consistent with the hypothesis that a trade-off between the strength of selection and the treatment specificity of expression helps maintain variation in plasticity for A. thaliana (Snell-Rood et al. 2010;Van Dyken and Wade 2010).
There are a few ways to think about the biological relevance of the correlations of treatment specificity with π N and dN.First, the magnitude of treatment specificity's correlation with π N and dN was generally half the magnitude of average expression's correlation with π N and dN and similar to tissue specificity's correlation with π N and dN.Both tissue specificity and average expression are thought to be important determinants of protein evolution (Bush et al. 2015;Wu et al. 2022), suggesting the comparable effects of treatment specificity may be important too.Second, the effect of treatment specificity on π N and dN persisted even after simultaneously controlling for expression level, tissue specificity, gene length, GC content, and batch effects.Finally, the top 25% most treatment-specific genes in leaf tissue for our dataset have average dN and π N values nearly 2.5 times greater than the 25% least treatment-specific genes (dN = 0.025 vs 0.061; π N = 0.0014 vs 0.0032), but relatively similar Tajima's D and DoS values (Tajima's D = −0.44 vs −0.43; DoS = −0.19vs −0.14).These observations together suggest that treatment specificity is an important determinant of protein evolution.
This study disentangles several processes that were often difficult to resolve in previous research.First, many previous studies focus mainly on explaining trends in dN/dS (Gaut et al. 2011;Slotte et al. 2011;Bush et al. 2015), but both relaxed negative selection and increased positive selection can lead to increases in dN/dS.To tease apart these two processes, we additionally investigated treatment specificity's relationship with Tajima Average expression excludes values <5 TPM and was calculated using only leaf tissue samples.Treatment specificity was also calculated using only leaf tissue samples.Tissue specificity was calculated using only control samples across all tissue categories.The number of genes included in each partial correlation analysis (n) is listed at the top of each heatmap.
and DoS, compared to dN and π N , suggests that relaxed negative selection plays a larger role than increased positive selection in explaining the high evolutionary rates of treatment-specific genes.Furthermore, measures of expression specificity are often highly correlated with expression level (Slotte et al. 2011;Alvarez-Ponce and Fares 2012;Huang 2022).When calculating a gene's expression level, we only included samples where said gene was expressed (TPM > 5) to get an estimate of expression level that was still correlated with dN and π N , but was independent of expression specificity, allowing us to better disentangle these factors.Finally, previous studies have struggled to partition the factors that influence selection on genes in the presence of predictor variables with considerable error, such as expression level (Drummond et al. 2006;Plotkin and Fraser 2007;Yang and Gaut 2011).Error in expression measurements can often be attributed  to unmeasured differences between RNA-sequencing experiments (Leek et al. 2010) and we accounted for these differences using SVA (Leek and Storey 2007).Even after SVA, treatment specificity was strongly correlated with dN and π N (Fig. 5, a and b), suggesting our results are not an artifact of errors in expression measurement or combining expression data across many studies.Surprisingly, nearly all genes in A. thaliana have some degree of treatment specificity in their expression (Fig. 2a, Supplementary Fig. S3), reflecting results of previous studies on tissue specificity (Eisenberg and Levanon 2003).The high prevalence of treatment specificity in our dataset is partly explained by batch effects because SVA significantly lowered the apparent treatment specificity of most genes (Supplementary Figs.S16b-S21b) and reduced within-treatment differentiation in PCA space (for example, see Supplementary Fig. S16, c and d).This reduction in treatment specificity likely happened because batch effects can include unrecorded between-treatment differences (e.g. the humidity of the growth chamber, light intensity, watering schedule, etc.).Controlling for these unrecorded between-treatment differences thus causes the expression of genes to be less treatment-specific.However, even after batch correction most genes still showed some degree of treatment specificity (Supplementary Fig. S16b-S21b), suggesting it is rare for a gene to be expressed at the same level across many environments.
We also observed that genes with higher treatment specificity generally belonged to larger gene families.We expected gene family size to correlate with selection because singleton and duplicated genes often evolve at different rates (Davis and Petrov 2004;Jordan et al. 2004).Theory also suggests that gene duplication relaxes selection on duplicates, allowing for neo-and subfunctionalization (Lynch and Conery 2000;Aagaard et al. 2006).We could not investigate how gene family size correlates with dN or DoS because measuring these quantities requires identifying substitutions between orthologous genes.Thus, dN and DoS can only be reliably measured for 1:1 orthologs between A. thaliana and A. lyrata.However, π N and Tajima's D can be calculated for genes in larger families and we did observe persistent correlations between family size and Tajima's D (For Fig. 5c: ρ = 0.05, P − value = 3.1 × 10 −12 ; also see Supplementary Figs.S6c-S15c, S28c-S33c).Altogether, these correlations suggest that processes of gene duplication, neofunctionalization, and subfunctionalization could be connected to evolving some degree of treatment specificity.
Gene length was generally the second most correlated factor with dN and π N in our study, just behind average expression.This is consistent with previous work suggesting that longer proteins require more energy to synthesize and are thus under stronger selective constraints (Urrutia and Hurst 2001;Castillo-Davis et al. 2002;Eisenberg and Levanon 2003;Urrutia and Hurst 2003).However, while some previous studies in A. thaliana observe this same trend (Bush et al. 2015), others do not (Slotte et al. 2011).This discrepancy could be due to differences in how gene length is defined between studies.In this study, each gene's length included coding sequence as well as introns and untranslated regions, whereas other studies break down gene length into individual features (Bush et al. 2015).The goal of this study was not to understand differences in evolution between different gene features, so we included all gene features in our estimate of gene length.However, introns and untranslated regions experience different evolutionary patterns than coding sequences; for example, highly expressed genes being under selection for shorter introns (Castillo- Davis et al. 2002;Eisenberg and Levanon 2003).Therefore, future studies must clearly define even seemingly simple features like gene length to ensure that results are comparable across studies.
Although we focused on testing the idea that treatment specificity is responsible for relaxed negative selection in some genes, it is also possible that relaxed selection caused the evolution of treatment specificity.There is some evidence that relaxation of selection occurs before the evolution of expression specificity (Hunt et al. 2011) and may better explain cases of neo-and subfunctionalization (Lynch and Conery 2000;Aagaard et al. 2006).Future experiments that look at the evolution of treatment specificity and sequence evolution across a broader phylogenetic scale may be helpful for determining the order of these processes.
In summary, this study investigates a trade-off between the treatment-specific expression of a gene and the strength of selection said gene experiences, which is hypothesized to limit plasticity evolution.Consistent with this hypothesis, genes in A. thaliana with more treatment-specific expression are under weaker selection compared to more evenly expressed genes.While we find that this trade-off exists, we could not dissect the direction of causality in the trade-off or determine how much this trade-off constrains plasticity evolution relative to other processes.However, these are exciting areas of future research.Future studies should ideally generate fully balanced datasets on gene expression acquired across natural environmental gradients.Taking these steps will contribute to a comprehensive understanding of the constraints on plasticity and protein evolution.

Fig. 1 .
Fig.1.Principal components analysis of all expression data.Each point represents a different RNA-seq experiment and is colored by its associated tissue type.Experiments from all treatment conditions are included in this analysis.The percent values on the axes represent the percent variation explained by each principal component.Plotting order was randomized to avoid overplotting.

Fig. 2 .
Fig.2.Density plots of key variables measured in this study.a) Distribution of treatment specificity in leaf tissue expression across all genes included in this study.The area underneath the curve in a given interval of treatment specificity represents the proportion of genes in this study that fall within that range of treatment specificity.b) Distribution of dN/dS across all genes included in this study.The area to the right of the dashed line represents the proportion of genes in this study with dN/dS > 1. c) Distribution of π N /π S across all genes included in this study.The area to the right of the dashed line represents the proportion of genes in this study with π N /π S > 1. d) Distribution of DoS across all genes in this study.Area to the right of the dashed line represents the proportion of genes with DoS > 0, which is interpreted as evidence of adaptive evolution.

Fig. 3 .
Fig. 3. Correlation between the average expression in transcripts per million (TPM) and treatment specificity of genes when samples with low expression (<5 TPM) are included a) vs excluded b).Expression level and treatment specificity were calculated using only data from leaf tissue samples.Line is a smoothing line with 95 % confidence intervals and values in parentheses give spearman correlation.

Fig. 4 .
Fig. 4. Partial correlation analysis including either a) dN, b) π N , c) Tajima's D, or d) direction of selection (DoS) as a covariate.Average expression excludes values <5 TPM and was calculated using only leaf tissue samples.Treatment specificity was also calculated using only leaf tissue samples.Tissue specificity was calculated using only control samples across all tissue categories.The number of genes included in each partial correlation analysis (n) is listed at the top of each heatmap.

Fig. 5 .
Fig. 5. Partial correlations for a) dN, b) π N , c) Tajima's D, and d) direction of selection (DoS) based on leaf tissue data subset after applying SVA.Data were further subset to include only treatment groups with data from more than one study before applying SVA.Average expression calculation excludes values <5 TPM.The number of genes included in each partial correlation analysis (n) is listed at the top of each heatmap.

Table 1 .
Partial correlations between treatment specificity and different measures of selection pre-SVA and post-SVA.All correlation coefficients are spearman coefficients and are calculated only on leaf tissue samples.b All P-values represent whether correlation coefficient significantly differs from 0. a