Core circadian clock and light signaling genes brought into genetic linkage across the green lineage

Abstract The circadian clock is conserved at both the level of transcriptional networks as well as core genes in plants, ensuring that biological processes are phased to the correct time of day. In the model plant Arabidopsis (Arabidopsis thaliana), the core circadian SHAQKYF-type-MYB (sMYB) genes CIRCADIAN CLOCK ASSOCIATED 1 (CCA1) and REVEILLE (RVE4) show genetic linkage with PSEUDO-RESPONSE REGULATOR 9 (PRR9) and PRR7, respectively. Leveraging chromosome-resolved plant genomes and syntenic ortholog analysis enabled tracing this genetic linkage back to Amborella trichopoda, a sister lineage to the angiosperm, and identifying an additional evolutionarily conserved genetic linkage in light signaling genes. The LHY/CCA1–PRR5/9, RVE4/8–PRR3/7, and PIF3–PHYA genetic linkages emerged in the bryophyte lineage and progressively moved within several genes of each other across an array of angiosperm families representing distinct whole-genome duplication and fractionation events. Soybean (Glycine max) maintained all but two genetic linkages, and expression analysis revealed the PIF3–PHYA linkage overlapping with the E4 maturity group locus was the only pair to robustly cycle with an evening phase, in contrast to the sMYB–PRR morning and midday phase. While most monocots maintain the genetic linkages, they have been lost in the economically important grasses (Poaceae), such as maize (Zea mays), where the genes have been fractionated to separate chromosomes and presence/absence variation results in the segregation of PRR7 paralogs across heterotic groups. The environmental robustness model is put forward, suggesting that evolutionarily conserved genetic linkages ensure superior microhabitat pollinator synchrony, while wide-hybrids or unlinking the genes, as seen in the grasses, result in heterosis, adaptation, and colonization of new ecological niches.

2002; Oda et al., 2007). The double mutant (cca1/lhy) results in an even shorter FRP (6 hrs) than the single mutant, early flowering and plants with very small stature, suggesting that these orthologs are partly redundant in the core clock (Mizoguchi et al., 2002;Lu et al., 2009;Salomé et al., 2010).
In addition to CCA1 and LHY, the sMYB sub-family consists of eight other genes that share both the "SHAQKYF" motif as well as dawn-specific expression and thus were named REVEILLE (RVE) (Chaudhury et al., 1999). RVE3, RVE4, RVE5, RVE6, and RVE8 form a subclade of the RVEs since they also share the LHY-CCA1-like (LCL) domain and hence have also been referred to as LCL3, LCL1, LCL4, LCL2, and LCL5 respectively (Farinas and Mas, 2011). The LCL sub-clade are generally expressed at dawn like the other RVEs except RVE6/LCL2 only cycles under LD and SD conditions (not under circadian free-run conditions), and RVE4 peaks in the afternoon under thermocycles (Michael et al., 2008). In contrast, RVE1, RVE2 and RVE7 peak at ZT0 (dawn), ZT18 (midnight) and ZT07 (afternoon) respectively (Supplemental Figure S1) (Michael et al., 2008).
The first RVE gene described was an gain of function (overexpression) line of RVE7/EARLY-PHYTOCHROME-RESPONSIVE 1 (EPR1) that didn't result in circadian period defects but did cause late flowering under LD and repressed its own expression consistent with it forming a slave oscillator (Kuno et al., 2003). Next, an overexpressing line of RVE2/CIRCADIAN 1 (CIR1) was described that resulted in short FRP, delayed flowering, longer hypocotyls and reduced seed germination in the dark (Zhang et al., 2007). Similar to RVE7, loss of RVE1 did not result in a defect in FRP, but it does result in changes in growth due to alterations in the auxin pathway (Rawat et al., 2009). The LCL sub-clade was the last to be described beginning with RVE8/LCL5; overexpression results in a shorter FRP and late flowering under both LD and SD, while loss of function causes a long FRP and early flowering (Farinas and Mas, 2011;Rawat et al., 2011). Loss of function of either of the closely related RVE4/LCL1 or RVE6/LCL2 does not result in a FRP change, but the double (rve4/8, 27 hrs; rve6/8, 26 hrs) or triple (rve4/6/8, 28 hrs) with RVE8/LCL5 results in a progressively longer FRP ( Figure 1) (Hsu et al., 2013). Loss of function of either RVE3/LCL3 or RVE5/LCL4 also do not result in FRP changes, while the double mutant (rve3/5) has a slightly shorter FRP and the quintuple mutant (rve3/4/5/6/8) has a even longer FRP (28 hrs) (Gray et al., 2017).
Since both the sMYB and PRR gene families are redundant, combinatorial mutation analysis provides clues as to significance of the CCA1-PRR9 and RVE4-PRR7 genetic linkages. One study has looked at the loss of the CCA1-PRR9 linkage but in the context of other redundant genes LHY and PRR7 (Salomé et al., 2010). The double mutant prr7/9 results in a very long FRP, yet when CCA1 or LHY are reduced using artificial microRNAs (amiR) silencing technology, the period is shortened almost to wild type FRP, while loss of both CCA1 and LHY has a similar short FRP as the lhy/cca1 (Salomé et al., 2010). These results suggest that the CCA1-PRR9 linkage results in reciprocal impacts on FRP, but that CCA1 and LHY are epistatic to PRR7 and PRR9 impacts on FRP.
In contrast to the lhy/cca1 double mutant that results in plants with a smaller stature, the rve4/6/8 triple mutant results in larger plants and the increased growth that is dependent on PIF4 and PIF5 (Gray et al., 2017). However, loss of both lhy/cca1 and rve4/6/8 (lhy/cca1/rve4/6/8 quintuple) restores the growth defect and FRP, suggesting that the the two clades of sMYB have reciprocal and dispensable roles in maintaining timing information and growth (Shalit-Kaneh et al., 2018). So what are these specific feedback loops used for? While the lhy/cca1/rve4/6/8 quintuple has restored growth and FRP, the circadian clock is less robust with decreased amplitude and suboptimal response to adverse environmental conditions (Shalit-Kaneh et al., 2018). Since CCA1-LHY-mediated temperature compensation requires both PRR7 and PRR9 (Salomé et al., 2010), it is possible that the CCA1-PRR9 and RVE4-PRR7 linkages represented inherited positive and negative respectively regulators of growth through the PIFs in a thermo and photo-sensitive way. To this end, the PRR9 and RVE4 phase of expression is shifted by 4 hours under thermocycles (while CCA1 and PRR7 are not), suggesting differential integration of thermocycle information (Michael et al., 2008).

Gene neighborhoods in plant genomes
Gene order in eukaryotes is generally poorly conserved resulting in seemingly random organization across chromosomes in contrast to prokaryotes where genes are often organized in functional arrays, or operons (Rocha, 2008). However, with more high-quality genomes and analytical tools it has become clear that there is in fact some level of gene clustering in eukaryotes and that some gene order is conserved evolutionarily (Hurst et al., 2004;Michalak, 2008). Two different studies across a collection of eukaryotic genomes spanning from plants to humans revealed that functionally and transcriptionally related genes are found in non-random clusters in the genome (Lee and Sonnhammer, 2003;Dávila López et al., 2010). In humans, bidirectional promoters play a role in proximally co-expressed genes (Adachi and Lieber, 2002;Trinklein et al., 2004). In yeast, essential genes are more likely to be found in clusters where the recombination rate is lower and this is independent of co-expression, suggesting that at some level genetics plays a role at preserving gene order (Pál and Hurst, 2003).
While many studies focus on identifying clusters based on functional or co-expression information, several tools have been developed to take an unbiased approach to find "gene neighborhoods," or Proxiomal Ortholog Gene (POG) pairs of non-homologous genes (Winter et al., 2016;Marcet-Houben and Gabaldón, 2020;Foflonker and Blaby-Haas, 2021). Leveraging an evolutionary approach, up to 32% of the gene space across 341 fungal genomes are found in gene neighborhoods with many representing metabolic clusters (Marcet-Houben and Gabaldón, 2019). In algal genomes far fewer gene neighborhoods were identified, but they revealed several non-metabolic novel pathways (Foflonker and Blaby-Haas, 2021).
In plants a systematic look for gene neighbors has primarily been restricted to metabolic pathways (Osbourn, 2010;Kautsar et al., 2017;Nützmann et al., 2018;Nützmann et al., 2020;Bharadwaj et al., 2021), or co-expressed genes (Williams and Bowles, 2004;Zhan et al., 2006;Chen et al., 2010). Plants are special amongst the eukaryotes since they undergo extensive whole genome duplication (WGD) and polyploidy events followed by rounds of fractionation that greatly increases the random order of genes and decreases gene synteny across lineages (Vision, 2005;Cheng et al., 2018). For instance, it has been shown across an array of highquality genomes of mammals and plants that only closely related plants retain a similar level of synteny that is found across all mammals (Zhao and Schranz, 2019). Therefore, the sMYB-PRR and PIF3-PHYA evolutionarily conserved non-homologous gene clusters involved in a genetic network (as opposed to a metabolic pathway) are the first to be described across plant genomes.

The evolutionary significance of the environmental robustness model
The increasing closeness of the gene linkages appears around the same time as the rise to dominance of angiosperms over gymnosperms and ferns during the Cretaceous (Condamine et al., 2020). The phenotypic and species diversity of the angiosperm has been attributed to the multiple rounds of whole genome duplication (polyploidy) and fractionation (Soltis et al., 2009), which is the process by which the gene linkages are moving closer together over evolutionary time. There are two forces at work here: first, polyploidy often brings together distant genomes (allopolyploidy), which is thought to be maintained due to hybrid vigor enabling the ability to thrive in harsh/disparate environments or an asexual lifestyle (Fawcett et al., 2009;Cheng et al., 2018). Second, polyploids are ultimately reduced back to diploids (Zhao et al., 2017), which must thrive in their specific environment, yet sex is risky because it would be easy to make an unwanted genetic combination for a local environment (Freeling, 2017).
It is thought that the major innovation that led to the dominance of angiosperm was the flower and the specific relationship that it fostered with pollinators (Supplementary Figure S10) (Regal, 1977). Therefore, the linkage of light and circadian genes ensures that plants are tuned to exploit their specific environments, inheriting the correct combination for their local conditions so they will grow optimally under different seasons (Michael et al., 2003;Dodd et al., 2005). Another burst of polyploidy occurred at the Cretaceous/Tertiary (K/Pg) boundary that coincided with several natural disasters (Fawcett et al., 2009), and at this time the light and circadian gene linkages moved closer together in almost all species test, except the grasses. This suggests that most plants "doubled down" on ensuring that the circadian system was inherited for a specific location; maybe the global decreasing temperature and carbon dioxide made it more likely that plants specially tuned for their environment would thrive and reproduce (Condamine et al., 2020).
At the same time, grasses became the most successful angiosperms and started to fill new ecological niches such as shaded forests and later open plains (Linder et al., 2018). Grasses are completely wind pollinated and flower at specific times of day (TOD) (Friedman and Barrett, 2009), suggesting they have taken the exact opposite route from other angiosperms and aggressively ensure every progeny has a new combination of circadian and light alleles. In essence, every pollination event represents a wide-hybrid that experiences heterosis or hybrid vigor, which enables it to outcompete populations in its new location. This strategy has been termed the "Viking syndrome" describing the ability of the grasses to colonize, persist and transform their environments (Linder et al., 2018). Taken together, the close genetic linkage favors animal pollination where specific circadian timing states are maintained; whereas the broken linkages favors wind pollination where diverse circadian states enable possible colonization of new environments.

Supplemental Tables
Supplemental Table S1. Arabidopsis circadian clock, light signaling and flowering time genes.