Traces of strong selective pressures in the genomes of C4 grasses

C4 photosynthesis is nature's response to CO2 limitations, and evolved recurrently in several groups of plants. To identify genes related to C4 photosynthesis, Huang et al. looked for evidence of past episodes of adaptive evolution in the genomes of C4 grasses. They identified a large number of candidate genes that evolved under divergent selection, indicating that, besides alterations to expression patterns, the history of C4 involved strong selection on protein-coding sequences.

C 4 photosynthesis is nature's response to CO 2 limitations, and evolved recurrently in several groups of plants. To identify genes related to C 4 photosynthesis, Huang et al. looked for evidence of past episodes of adaptive evolution in the genomes of C 4 grasses. They identified a large number of candidate genes that evolved under divergent selection, indicating that, besides alterations to expression patterns, the history of C 4 involved strong selection on protein-coding sequences.
The C 4 syndrome relies on a series of anatomical and biochemical adaptations that function together to concentrate CO 2 in some parts of the leaf (Hatch, 1987). This effectively boosts photosynthesis, and increases growth rates in subtropical and tropical conditions (Atkinson et al., 2016). The prospect of improving non-C 4 crops, such as rice and wheat, by engineering an efficient C 4 cycle in them is therefore very appealing, and several projects have been set up in an attempt to realize this ambitious goal. Unfortunately, while the main enzymes of the C 4 pathway were identified long ago and have been characterized in detail, the genetic mechanisms underlying regulation of the pathway, transport of metabolites, and leaf anatomy remain poorly understood.
Engineering a complex biochemical pathway, which requires the action and coordination of numerous proteins, is virtually impossible when some of the underlying genes are yet to be identified. Evolution successfully engineered this intricate pathway, and did it a remarkably large number of times for such a complex trait (Sage et al., 2011). While the details of how this happened are still to be elucidated, the traces of this accomplishment should still be present in the genomes of extant species. Any single genome consists of a 'long list of letters' that is difficult to decipher, and yet the comparison of multiple genomes has the power to reveal changes that happened during evolution. Obviously, the significance of these changes is another problem, but past evolutionary pressures left specific footprints on the small fraction of genomes that correspond to protein-coding genes.
Because each amino acid can be encoded by different nucleotide triplets, some nucleotide changes (substitutions) do not affect the protein. These are synonymous substitutions, while non-synonymous substitutions change the amino acid and so result in a slightly different protein. Under a purely stochastic model, the rates of fixation of these two types of substitutions should be similar and, as such, their ratio (dN/ dS) should equal one (Yang, 1998). Most non-synonymous changes will, however, be detrimental and thus preferentially removed by selection, leading to an observed dN/dS much smaller than one in most cases. Exceptionally, when a change in the catalytic properties of the encoded enzyme benefits the organism, the rate of fixation of non-synonymous substitutions will increase, leading to an observed dN/dS that can exceed one, at least for some parts of a gene. Such instances of positive selection are classically associated with 'arms races' between hosts and pathogens, leading to sustained elevated dN/dS throughout the history of the gene (Endo et al., 1996). However, episodic changes to the catalytic environment can also increase dN/dS for limited periods, corresponding to a few branches of a phylogenetic tree. Huang et al. looked for such traces of past episodes of adaptive evolution linked to C 4 photosynthesis by comparing the genomes of C 4 and non-C 4 grasses (Box 1).

Evidence of past positive selection reveals candidate genes for C 4 photosynthesis
Tracking evolutionary modifications to identify changes linked to C 4 photosynthesis is not a new idea. The many independent origins of the C 4 trait make it particularly amenable to comparative studies, enabling identification of the ecological and physiological changes linked to its evolution (e.g. Edwards and Smith 2010;Atkinson et al., 2016). In recent years, attempts to identify all of the C 4 -related genes similarly relied on evolution-based comparisons, but these mainly focused on gene expression (Brautigam et al., 2014;Mallmann et al., 2014). It is only more recently that attention has turned to genomic changes, such as duplication of genes (Emms et al., 2016). While adaptive evolution of C 4 enzymes involving kinetic changes has been reported (Svensson et al., 2003;Christin et al., 2007), one might hypothesize that this would concern only a handful of enzymes -specifically, those linked to core C 4 reactions and their very high catalytic rates. Huang and colleagues decided to challenge this assumption and adopt bioinformatic approaches to identify all the genes that evolved under elevated dN/dS, specifically in C 4 grasses.
After screening the genomes of six grasses, including three C 4 taxa belonging to two independent C 4 origins (Box 1), Huang et al. identified 88 genes that evolved under elevated dN/dS on branches belonging to one or several of the C 4 lineages. This type of genome scan is inherently subject to false positives. In addition, the methodology cannot strictly differentiate between adaptive evolution and relaxed selection. Finally, the genes might have been under divergent selection along these branches for reasons other than C 4 evolution. Fortunately, the putative link with the C 4 trait was confirmed for a number of candidates by independent evidence, including a priori knowledge for a few of them and high expression in C 4 tissues for many others. The list produced by Huang et al. therefore includes many promising candidates, some of which might be linked to C 4 anatomy. If confirmed, their identification would represent a major breakthrough for the engineering of C 4 photosynthesis into non-C 4 crops. In the short term the results already affect the way we should envision C 4 evolution.

Physiological innovation through adaptive evolution of numerous protein-coding genes
For the most part, previous studies have linked phenotypic variation to alterations in gene expression and regulation (King and Wilson, 1975;Brawand et al., 2011). While these have certainly also played a key role in the evolution of C 4 photosynthesis, the new results show that the modification of promoter sequences and regulatory networks is only one part of the story, which also includes adaptive changes in the coding sequences of a large number of genes. This is impressive, providing even more evidence that the recurrent transition to C 4 photosynthesis represents a considerable evolutionary feat. The observations of Huang et al. also reveal a new set of questions; in particular, why did the coding sequences of so many proteins need to be adapted, both in terms of biochemical properties and evolutionary drivers? The biochemical component of this question will remain unanswered until extensive characterization is performed, and yet hints about the evolutionary pressures can already be proposed.
The precise timing of positive selection episodes is beyond the scope of this comparative work because of the limited number of species sampled, which corresponds to the few grasses for which a complete genome is currently available. Indeed, these episodes are inferred along phylogenetic branches that expand from the last divergence of C 4 and non-C 4 taxa to the first divergence of two C 4 taxa within the same lineage (Box 1). With only six species, this interval is initiated long before the transition to C 4 photosynthesis and continues for a long period after C 4 evolved, spanning up to 20 million years of changes. As more genomes become available, similar analyses will be able to pin down the timing of these episodes of positive selection with more precision. Until then, we can only speculate.
As with any complex trait, the numerous changes that define extant C 4 plants were probably spread over long Box 1. Adaptive evolution in C 4 grasses Phylogeny of the six species included in the analysis from Huang et al.: maize (Zea mays), Sorghum bicolor, Setaria viridis (wild progenitor of Setaria italica), Dichanthelium oligosanthes, Brachypodium distachyon and rice (Oryza sativa). Red branches indicate where the C 3 to C 4 transitions occurred. Photos courtesy of Pu Huang, James Schnable and Elizabeth Kellogg. periods of evolutionary time, from the occurrence of capacitating mutations in non-C 4 ancestors to changes directly responsible for the emergence of a C 4 physiology, and continuous adaptive alterations after its origin (Christin and Osborne, 2014). Modelling efforts predict that changes in expression patterns can lead to the emergence of a C 4 cycle in plants with C 4 -like anatomical traits (Heckmann et al., 2013;Mallmann et al., 2014). However, evolution did not stop after the initial transition to C 4 photosynthesis, and the presence of a working C 4 cycle, even if rudimentary, probably created a selective impetus for the fixation of substitutions that improved the C 4 syndrome. The genes detected by Huang et al. probably underwent adaptive mutations that improved the fit of the proteins to the new catalytic environment. Their impressive number suggests that the selective pressure for improving the C 4 syndrome was very strong or maintained over a long evolutionary period, possibly throughout the diversification of C 4 plants.
Until now, research on C 4 evolution has focused mainly on the events that led to a C 4 cycle, largely ignoring those that followed its emergence. The discovery of widespread C 4 -related selection on coding genes should motivate new research into the changes that contributed to the improvement or diversification of the C 4 syndrome. A first step in this direction is to acknowledge the diversity of C 4 -related traits within each C 4 lineage, and design comparative experiments that capture this diversity. With continuous advances in sequencing technology, this goal might soon become achievable for comparative genomics, contributing towards a full elucidation of the changes that were selected, both before and after the first C 4 plants emerged.