Genetic adaptations in the population history of Arabidopsis thaliana

Abstract A population encounters a variety of environmental stresses, so the full source of its resilience can only be captured by collecting all the signatures of adaptation to the selection of the local environment in its population history. Based on the multiomic data of Arabidopsis thaliana, we constructed a database of phenotypic adaptations (p-adaptations) and gene expression (e-adaptations) adaptations in the population. Through the enrichment analysis of the identified adaptations, we inferred a likely scenario of adaptation that is consistent with the biological evidence from experimental work. We analyzed the dynamics of the allele frequencies at the 23,880 QTLs of 174 traits and 8,618 eQTLs of 1,829 genes with respect to the total SNPs in the genomes and identified 650 p-adaptations and 3,925 e-adaptations [false discovery rate (FDR) = 0.05]. The population underwent large-scale p-adaptations and e-adaptations along 4 lineages. Extremely cold winters and short summers prolonged seed dormancy and expanded the root system architecture. Low temperatures prolonged the growing season, and low light intensity required the increased chloroplast activity. The subtropical and humid environment enhanced phytohormone signaling pathways in response to the biotic and abiotic stresses. Exposure to heavy metals selected alleles for lower heavy metal uptake from soil, lower growth rate, lower resistance to bacteria, and higher expression of photosynthetic genes were selected. The p-adaptations are directly interpretable, while the coadapted gene expressions reflect the physiological requirements for the adaptation. The integration of this information characterizes when and where the population has experienced environmental stress and how the population responded at the molecular level.


Table of Content
Supplementary Methods S1.Filtering of polymorphic sites for TreeMix and PolyGraph S2.Traits/gene expressions and their QTLs/eQTLs for PolyGraph Figure S1 The sampling points of Arabidopsis thaliana Figure S2 Multidimensional scaling of genotype data Figure S3 The admixture graph estimated by TreeMix Figure S4 The admixture graph of the countries with ten admixtures Figure S5 The distributions of numbers of QTLs and eQTLs Figure S6 The traits and genes with many QTLs and eQTLs tend to have undergone multiple times p-and e-adaptations Figure S7 The numbers of p-adaptations and e-adaptations along the edges of the admixture graph Figure S8 The assignment of the sample from Russia to the admixture groups Figure S9 Climates of the cities representing the sampling locations from the four lineages with large scale adaptations Figure S10 The sampling points in Azerbaijan Figure S11 The assignment of the sample from the United States to the admixture groups Figure S12 The proportions of the H-alleles at the photosynthesis-related eQTLs, whose allele frequencies changed significantly (FDR=0.05)along the lineage to the United States Figure S13 The output produced by OptM Figure S14 The output produced by OptM obtained by block resampling of genomic regions consisting of 300 SNPs Figure S15 The admixture graph estimated by TreeMix, assuming the number of admixture edges,  = 1 Figure S16 Variable mean expression levels of DOG1 among countries Table S1 Enrichment analysis of the genes with identified e-adaptations Table S2 Enrichment analysis of the genes with no identified e-adaptations Table S3 The numbers of p-adaptations and e-adaptations along each of the edges of the admixture graph Table S4 p-adaptations and enrichment analysis of e-adaptations along the lineage to Central Asia and South Siberia, Russia Table S5 p-adaptations and enrichment analysis of e-adaptations along the lineage to Sweden Table S6 p-adaptations and enrichment analysis of e-adaptations along the lineage to Azerbaijan Table S7 p-adaptations and enrichment analysis of e-adaptations along the lineage to the United States Table S8 Enrichment analysis of the QTL-coding genes of the cadmium concentrations in leaves (Cd111) Reference for Supplementary information Data S01 The estimated α values (selection parameters) of p-adaptations at each edge of the admixture graph Data S02 The estimated α values of e-adaptations at each edge of the admixture graph Data S03 The Z values of the estimated α values of p-adaptations at each edge of the admixture graph Data S04 The Z values of the estimated α values of e-adaptations at each edge of the admixture graph Data S05 p-adaptations at each edge (selected traits, annotation and enrichment analysis of causal genes) Data S06 e-adaptations at each edge (selected gene expressions, annotation and enrichment analysis of these genes) Data S07 p-daptations and e-adaptations along the four lineages       Bar plots representing the numbers of the adaptations are shown.The edges are labeled by the two nodes they connect (Figure S3).The exact numbers are shown in Table S3.Nordborg & Bergelson, 1999;Lasky et al., 2012).Green triangles and orange circles show seed germination and flowering times in Sweden (Ågren & Schemske, 2012).S3 The numbers of p-adaptations and e-adaptations along each of the edges of the admixture graph.The edges are labeled by the two nodes they connect (Figure S3).

Figure
Figure S1 The sampling points of Arabidopsis thaliana.The geographic distribution of the samples recorded in AtMAD is represented by red points.

Figure
Figure S2 Multidimensional scaling of genotype data.

FigureFigure
Figure S5 The distributions of numbers of QTLs and eQTLs.(a) The distribution of the number of QTLs among the traits with identified QTLs.Out of 516, 248 had identified QTLs with p-values less than 10 .(b) The distribution of the number of eQTLs among the gene expressions with identified eQTLs.Out of 33,602, 2,879 had identified eQTLs with p-values less than 10 .Note that the x-axis is log-scaled.

Figure
Figure S7The numbers of p-adaptations and e-adaptations along the edges of the admixture graph.Bar plots representing the numbers of the adaptations are shown.The edges are labeled by the two nodes they connect (FigureS3).The exact numbers are shown in TableS3.

FigureFigure
Figure S8The assignment of the sample from Russia to the admixture groups.

Figure
Figure S10 The sampling points in Azerbaijan.Points in red are in Azerbaijan, while points in purple are in Armenia and Georgia.

Figure
Figure S11The assignment of the sample from the United States to the admixture groups.

Figure
Figure S12The proportions of the H-alleles at the photosynthesis-related eQTLs, whose allele frequencies changed significantly (FDR=0.05)along the lineage to the United States.The sample from the United States in the admixture groups of Western Europe and Germany, the sample from the United Kingdom of Great Britain in the admixture group of Western Europe, and the sample from Germany in the admixture groups of Germany are contrasted.The photosynthesis-related genes that were identified in the enrichment analysis of e-adaptations along the lineage to the United States and the eQTLs whose allele frequencies changed significantly (FDR=0.05)along this lineage were analyzed.Each eQTL has H-allele (higher expression) and L-allele (lower expression).For each individual, proportion of H-alleles were calculated.Here we called these alleles as derived alleles simply because they were minor alleles.

Figure
Figure S13 The output produced by OptM.(a) The mean and standard deviation (SD) across 10 iterations for the composite likelihood () (left axis, black circles) and proportion of variance explained (right axis, red "x"s).(b) The second-order rate of change (Δ()) across values of .

Figure
Figure S14 The output produced by OptM obtained by block resampling of genomic regions consisting of 300 SNPs.(a) The mean and standard deviation (SD) across 10 iterations for the composite likelihood () (left axis, black circles) and proportion of variance explained (right axis, red "x"s).(b) The second-order rate of change (Δ()) across values of .

Figure
Figure S15 The admixture graph estimated by TreeMix, assuming the number of admixture edges,  = .
1%.Finally, 37,718 SNPs remained with MAF≧ 1%.For the PolyGraph analysis, we used the subsets of these SNPs to contrast with QTLs and eQTLs.In total, 49,973 QTLs and 16,672 eQTLs were included in the AtMAD database.The 37,718 SNPs mentioned above contained 734 QTLs and 83 eQTLs.Since PolyGraph contrasts the allele frequencies of neutral SNPs with those of QTLs/eQTLs, we excluded PolyGraph contrasts the between-population variation in allele frequencies of the them from the 37,718 SNPs, and used 36,984 (= 37,718 − 734) SNPs for p-adaptation identification and 37,635 (= 37,718 − 83) SNPs to identify e-adaptations.S2.Traits/gene expressions and their QTLs/eQTLs for PolyGraphThe AtMAD database contained 49,973 QTLs and 16,672 eQTLs, which were subsets of the above 12,883,854 polymorphic sites.Among the 49,973 QTLs, 2 were non-biallelic.Of the 16,672 eQTLs, 2,933 were non-biallelic.
TableS1Enrichment analysis of the genes with identified e-adaptations

Table S4 p
-adaptations and enrichment analysis of e-adaptations along the lineage to Central Asia and South Siberia, Russia TableS8Enrichment analysis of the QTL-coding genes of the cadmium concentrations in leaves