Synergistic Binding of bHLH Transcription Factors to the Promoter of the Maize NADP-ME Gene Used in C4 Photosynthesis Is Based on an Ancient Code Found in the Ancestral C3 State

Abstract C4 photosynthesis has evolved repeatedly from the ancestral C3 state to generate a carbon concentrating mechanism that increases photosynthetic efficiency. This specialized form of photosynthesis is particularly common in the PACMAD clade of grasses, and is used by many of the world’s most productive crops. The C4 cycle is accomplished through cell-type-specific accumulation of enzymes but cis-elements and transcription factors controlling C4 photosynthesis remain largely unknown. Using the NADP-Malic Enzyme (NADP-ME) gene as a model we tested whether mechanisms impacting on transcription in C4 plants evolved from ancestral components found in C3 species. Two basic Helix-Loop-Helix (bHLH) transcription factors, ZmbHLH128 and ZmbHLH129, were shown to bind the C4NADP-ME promoter from maize. These proteins form heterodimers and ZmbHLH129 impairs trans-activation by ZmbHLH128. Electrophoretic mobility shift assays indicate that a pair of cis-elements separated by a seven base pair spacer synergistically bind either ZmbHLH128 or ZmbHLH129. This pair of cis-elements is found in both C3 and C4 Panicoid grass species of the PACMAD clade. Our analysis is consistent with this cis-element pair originating from a single motif present in the ancestral C3 state. We conclude that C4 photosynthesis has co-opted an ancient C3 regulatory code built on G-box recognition by bHLH to regulate the NADP-ME gene. More broadly, our findings also contribute to the understanding of gene regulatory networks controlling C4 photosynthesis.


Introduction
C 3 plants inherited a carbon fixation system developed by photosynthetic bacteria, with atmospheric carbon dioxide (CO 2 ) being incorporated into ribulose-1, 5-bisphosphate (RuBP) by the enzyme Ribulose Bisphosphate Carboxylase/ Oxygenase (RuBisCO) to form the three-carbon compound (C 3 ) 3-phosphoglycerate (Calvin and Massini 1952). However, RuBisCO can also catalyse oxygenation of RuBP, which leads to the production of 2-phosphoglycolate, a compound that is toxic to the plant cell and needs to be detoxified through an energetically wasteful process called photorespiration (Bowes et al. 1971;Sharkey 1988;Sage 2004). The oxygenase reaction of RuBisCO becomes more common as temperature increases and so in C 3 plants photorespiration can reduce photosynthetic output by up to 30% (Ehleringer and Monson 1993). In environments such as the tropics where rates of photorespiration are high, C 4 photosynthesis has evolved repeatedly from the ancestral C 3 state (Lloyd and Farquhar 1994;Osborne and Beerling 2006). Phylogenetic studies estimate that the first transition from C 3 to C 4 occurred around 30 million years ago (MYA) (Christin et al. 2008Vicentini et al. 2008). The ability of the C 4 cycle to concentrate CO 2 around RuBisCO limits oxygenation and so increases photosynthetic efficiency in conditions where photorespiration is enhanced (Hatch and Slack 1966;Maier et al. 2011;Christin and Osborne 2014;Lundgren and Christin 2017).
The evolution of C 4 photosynthesis involved multiple modifications to leaf anatomy and biochemistry (Hatch 1987;Sage 2004). In most C 4 plants, photosynthetic reactions are partitioned between two distinct cell types known as mesophyll (M) and bundle sheath (BS) cells (Langdale 2011).
M and BS cells are arranged in concentric circles around veins in the so-called Kranz anatomy (Haberlandt 1904), which enables CO 2 pumping from M to BS where RuBisCO is specifically located. Atmospheric CO 2 is first converted to HCO 3 by carbonic anhydrase (CA) and then combined with phosphoenolpyruvate (PEP) by PEP-carboxylase (PEPC) to produce oxaloacetate in the M cells. This four-carbon acid (C 4 ) is subsequently converted into malate and/or aspartate that transport the fixed CO 2 from M to BS cells (Kagawa and Hatch 1974;Hatch 1987). Three biochemical C 4 subtypes are traditionally described based on the predominant type of C 4 acid decarboxylase responsible for the CO 2 release around RuBisCO in the BS: NADP-dependent Malic Enzyme (NADP-ME, e.g. Zea mays), NAD-dependent Malic Enzyme (NAD-ME, e.g. Gynandropsis gynandra formerly designated Cleome gynandra) and phosphoenolpyruvate carboxykinase (PEPCK). However, recent reports suggest that only the NADP-ME and NAD-ME should be considered as distinct C 4 subtypes, which in response to environmental cues may involve a supplementary PEPCK cycle Rao and Dixon 2016).
The recruitment of multiple genes into C 4 photosynthesis involved both an increase in their transcript levels (Hibberd and Covshoff 2010) and also patterns of expression being modified from relatively constitutive in C 3 species (Maurino et al. 1997;Penfield et al. 2004;Taylor et al. 2010;Brown et al. 2011;Maier et al. 2011) to M-or BS-specific in C 4 plants (Hibberd and Covshoff 2010). Therefore, considerable efforts have been made to identify the transcription factors (TF) and the cis-elements they recognize that are responsible for this light-dependent and cell-specific gene expression (Hibberd and Covshoff 2010). Various studies suggest that different transcriptional regulatory mechanisms have been adopted during C 3 to C 4 evolution. One is the acquisition of novel cis-elements in C 4 gene promoters that can be recognized by TFs already present in C 3 plants (Matsuoka et al. 1994;Ku et al. 1999;Nomura et al. 2000), and a second possibility is the acquisition of novel or modified TFs responsible for the recruitment of genes into the C 4 pathway through cis-elements that pre-exist in C 3 plants (Patel et al. 2006;Brown et al. 2011;Kajala et al. 2012).
A small number of cis-elements found in different gene regions have been shown to be sufficient for the M-or BSspecific expression of C 4 genes. For example, a 41-base pair (bp) Mesophyll Expression Module 1 (MEM1) cis-element was identified from the PEPC promoter of C 4 Flaveria trinervia and shown to be necessary and sufficient for M cell-specific accumulation of PEPC transcripts in C 4 Flaveria species (Gowik et al. 2004). A MEM1-like cis-element has also been found in the C 4 carbonic anhydrase (CA3) promoter of Flaveria bidentis and shown to drive M cell-specific expression (Gowik et al. 2017). A second cis-element named MEM2 and consisting of 9 bp from untranslated regions has also been shown to be capable of directing M-specificity in C 4 G. gynandra (Kajala et al. 2012;Williams et al. 2016). Lastly, in the case of the NAD-ME gene from C 4 G. gynandra a region from the coding sequence generates BS-specificity (Brown et al. 2011). In contrast to these insights into cis-elements that control cell-specific expression in the C 4 leaf, no TFs recognizing these cis-elements have yet been identified.
To address this gap in our understanding, a bottom-up approach was initiated to identify TFs that regulate the maize gene ZmC 4 -NADP-ME (GRMZM2G085019) that encodes the Malic Enzyme responsible for releasing CO 2 in the BS cells. Using Yeast One-Hybrid two maize TFs belonging to the superfamily of basic Helix-Loop-Helix (bHLH), ZmbHLH128 and ZmbHLH129, were identified and functionally characterized. We show that these TFs bind two cis-elements synergistically and analysis of the NADP-ME promoters from grass species from BEP and PACMAD (Panicoideae subfamily) indicated that this regulation is likely derived from an ancestral G-box that is present in C 3 species.

Results
ZmbHLH128 and ZmbHLH129 Homeologs Bind FAR1/FHY3 Binding Site cis-Elements in the ZmC 4 -NADP-ME Promoter To identify TFs that interact with the ZmC 4 -NADP-ME gene (GRMZM2G085019), we studied the promoter region comprising 1982 bp upstream of the translational start site. This region was divided into six overlapping fragments ranging from 235 to 482 bp in length (supplementary table S1, Supplementary Material online) and used in Yeast One-Hybrid (Y1H). Each fragment was used to generate one yeast bait strain that was then used to screen a maize cDNA expression library. After screening at least 1.3 million colonies for each region of the promoter, two maize bHLH TFs known as ZmbHLH128 and ZmbHLH129 were identified. Both of these TFs bind the promoter between base pairs À389 and À154 in relation to the predicted translational start site of ZmC 4 -NADP-ME ( fig. 1A). These interactions were confirmed by re-transforming yeast bait strains harbouring each of the six sections of the promoter with cDNAs encoding ZmbHLH128 and ZmbHLH129. Consistent with the initial findings, ZmbHLH128 and ZmbHLH129 only activated expression of the HIS3 reporter when transformed into yeast containing fragment À389 to À154 bp upstream of ZmC 4 -NADP-ME ( fig. 1B, supplementary fig. S1, Supplementary Material online).
ZmbHLH128 and ZmbHLH129 possess a bHLH domain followed by a contiguous leucine zipper (ZIP) motif ( fig. 1C). This bHLH domain is highly conserved between both ZmbHLHs and consists of 61 amino acids that can be separated into two functionally distinct regions. The first is a basic region located at the N-terminal end of the bHLH domain and is involved in DNA binding, and the second is a Helix-Loop-Helix region mediating dimerization towards the carboxy-terminus ( fig. 1C) (Murre et al. 1989;Toledo-Ortiz et al. 2003 (Massari and Murre 2000;Li et al. 2006), this family of TFs has also been shown to bind to N-box (5 0 -CACGCG-3 0 ), N-box B (5 0 -CACNAG-3 0 ) and FBS (FAR1/FHY3 Binding Site, 5 0 -CACGCGC-3 0 ) motifs (Sasai et al. 1992;Ohsako et al. 1994;Fisher and Caudy 1998;Kim et al. 2016). Therefore, the ZmC 4 -NADP-ME promoter was assessed for additional cis-elements to which ZmbHLH128 and ZmbHLH129 might bind. A total of eight such cis-elements were found, consisting of two FIG. 1. ZmbHLH128 and ZmbHLH129 homeologs bind the ZmC 4 -NADP-ME promoter. (A) Schematic representation of the ZmC 4 -NADP-ME promoter, divided into fragments used as baits in Y1H screenings, and the ZmbHLH TFs identified. ATG and TAG are the translational start codon and the stop codon of the ZmC 4 -NADP-ME ORF, respectively. ZmbHLH position on the scheme indicates that they bind between the base pairs À389 and À154 in relation to the ATG. (B) Analysis of ZmbHLH-pZmC 4 -NADP-ME binding specificity. Each of the six yeast bait strains was transformed with both ZmbHLHs (pAD-GAL4-2.1::TF vectors) and positive interactions selected on CM -HIS -LEU þ 3-AT [yeast Complete Minimal medium lacking histidine and leucine amino acids, and supplemented with 3-amino-1, 2, 4-triazole (3-AT), a competitive inhibitor of the HIS3 gene product]. (C) Schematic representation of bHLH and ZIP protein domains, and respective position in protein sequences. (D) Schematic representation of ZmbHLH128 and ZmbHLH129 (black) and four additional maize homeolog gene pairs located in syntenic regions of chromosomes 4 and 5. Homeolog genes are indicated by colour. Arrows indicate direction of transcription of each gene. Genomic coordinates provided from the B73 RefGen_v3 assembly version. Borba et al. . doi:10.1093/molbev/msy060 MBE N-boxes B, two N-boxes, one G-box, two FBSs, and one E-box ( fig. 2A). Electrophoretic Mobility Shift Assays (EMSA) were used to test whether ZmbHLH128 and ZmbHLH129 were able to interact with each of these cis-elements in vitro ( fig. 2B and C). Consistent with the Y1H findings, EMSA showed that recombinant Trx::ZmbHLH128 and Trx::ZmbHLH129 proteins caused an uplift of radiolabeled probes containing FBS cis-elements (probes 6, 7, and 6 þ 7) ( fig. 2C), positioned between nucleotides À389 and À154 in relation to the predicted translational start site (see fig. 1A). ZmbHLH128 also showed weak binding to probe 3 that contained a N-box cis-element that was not bound by ZmbHLH128 or ZmbHLH129 in Y1H (see fig. 1B), and signal intensity was similar to that observed from probe 7 ( fig. 2C). We cannot exclude however that relatively weak binding to probe 7 is due to it being three nucleotides-shorter than the other probes ( fig. 2B). Trx alone and OsPIF14 (a bHLH known to bind the N-box motif; Cordeiro et al. 2016) were used as negative controls ( fig. 2C). The two FBS motifs, in probe 6 þ 7, are separated by a short 7 bp spacer sequence and are found in opposite orientations ( fig. 2B). The increase in band intensities detected when both cis-elements were combined (fig. 2C) suggests that they function synergistically. Overall, these data indicate that ZmbHLH128 and ZmbHLH129 target 21 bp of DNA sequence (7 bp FBS, 7 bp spacer, and 7 bp FBS).

ZmbHLH128 and ZmbHLH129 Form Both Homo-and Heterodimers and ZmbHLH129 Impairs trans-Activation by ZmbHLH128
Because ZmbHLH128 and ZmbHLH129 bind the FBS cis-elements in close proximity but also possess domains mediating protein dimerization, we next investigated whether these proteins form homo-and/or heterodimers. In vitro, the recombinant Trx::ZmbHLH128 and Trx::ZmbHLH129 proteins formed homodimers ( fig. 3A). To confirm this interaction in vivo, as well as to test for heterodimerization, Bimolecular Fluorescence Complementation Assays (BiFC) in maize protoplasts were performed. While negative controls produced no YFP fluorescence, ZmbHLH128 and ZmbHLH129 formed both homo-and heterodimers ( fig. 3B). With the exception of ZmbHLH129 homodimers Synergistic Binding of bHLH Transcription Factors . doi:10.1093/molbev/msy060 MBE whose location extended to the cytoplasm and plasma membrane, in each case YFP signal was specifically localized to the nucleus ( fig. 3B). Nuclear localization of these ZmbHLH proteins supports their roles as transcriptional regulators.
To test the capacity of ZmbHLH128 and ZmbHLH129 to regulate transcription, transient expression assays were performed in leaves of Nicotiana benthamiana. The GUS reporter gene driven by the fragment of pZmC 4 -NADP-ME to which ZmbHLH128 and ZmbHLH129 bind was used as reporter, while the full-length ZmbHLH128 and ZmbHLH129 CDS sequences driven by the constitutive CaMV35S promoter were used as effectors ( fig. 4A). Co-infiltration of this reporter with the ZmbHLH128 effector resulted in an increase in GUS activity, indicating that ZmbHLH128 can act as a transcriptional activator ( fig. 4B). In contrast, ZmbHLH129 showed no intrinsic trans-activation activity ( fig. 4C). In order to test whether the ZmbHLH128-ZmbHLH129 heterodimers had a different trans-activation activity from ZmbHLH128 or ZmbHLH129 homodimers, leaves were co-infiltrated with the reporter and both effectors simultaneously. Interestingly, the trans-activation activity observed for the ZmbHLH128 alone ( fig. 4B) was lost when this TF was co-expressed with its homeolog ZmbHLH129 ( fig. 4D).
The G-Box-Based cis-Element Pair Recognized by ZmbHLH128 and ZmbHLH129 in NADP-ME Promoters Operates Synergistically To understand whether the two FBS cis-elements identified in the promoter of ZmC 4 -NADP-ME (see fig. 2) are associated with the evolution of C 4 photosynthesis, we investigated whether they are conserved in promoters of other NADP-MEs from C 3 and C 4 grass species. Three C 3 species (Dichanthelium oligosanthes, Oryza sativa, and Brachypodium distachyon) and three C 4 species (Z. mays, Sorghum bicolor, and Setaria italica) were assessed ( fig. 5A). Within the C 4 species, Z. mays and S. bicolor possess two plastidic NADP-ME isoforms: one that is used in C 4 photosynthesis (C 4 -NADP-ME, GRMZM2G085019, and Sobic.003g036200) and a second one not involved in the C 4 cycle (nonC 4 -NADP-ME, GRMZM2G122479, and Sobic.009g108700) (Alvarez et al. 2013;Emms et al. 2016). In contrast, S. italica possesses only one plastidic NADP-ME isoform that is used in the C 4 cycle (C 4 -NADP-ME, Si000645) (Alvarez et al. 2013;Emms et al. 2016).
Although in C 3 B. distachyon no homologous cis-elements to the FBSs in the ZmC 4 -NADP-ME promoter were detected, in O. sativa one G-box was found in the same position as FBS 1 from Z. mays. Moreover, in the other promoters, cis-elements that can bind bHLH proteins were present in pairs ( fig. 5A). In both the C 3 and C 4 grasses these cis-element pairs flank a spacer that is highly conserved in sequence and length (7-9 bp) ( fig. 5A). The C 4 -NADP-ME promoters from Z. mays and S. bicolor share a common mutation in the third nucleotide position of the alignment (A!G) ( fig. 5A). Two additional mutations are specific to Z. mays (the first and last nucleotides of FBS 1 and FBS 2, respectively), while one is S. bicolor-specific (C!T at the fourth position) ( fig. 5A). It is possible that mutations unique to Z. mays or S. bicolor are neutral and the main impact on C 4 -NADP-ME gene expression is due to mutation in the third nucleotide in the common ancestor of Z. mays and S. bicolor. Alternatively, it is also possible that both this mutation in the last common ancestor and species-specific modifications impacted on gene expression of C 4 -NADP-ME.
To test if ZmbHLH128 and ZmbHLH129 bind the cis-elements identified from these additional species EMSA was performed on each cis-element separately as well as the ciselement pairs found in each NADP-ME promoter ( fig. 5B and C, supplementary table S3, Supplementary Material online). ZmbHLH128 and ZmbHLH129 showed low binding affinity for the single G-box identified in the O. sativa promoter (probe 13) and binding affinity was not increased by mutating the G-box to a canonical N-box (probe m13) ( fig. 5B and C). This low binding affinity behaviour for single G-box cis-elements was consistent for all the NADP-ME promoters containing Gboxes (probes 5, 7, 9, and 11) ( fig. 5B and C). Although both ZmbHLHs did not show binding affinity for the additional Nboxes or N-box-like alone (probes 6, 8, 10, and 12) (fig. 5B and C), when these additional motifs were acquired and formed a pair with the ancestral G-box, binding affinity was increased (probes 5 þ 6, 7 þ 8, 9 þ 10, and 11 þ 12) and led to an increased uplift compared with the G-boxes alone (probes 5, 7, 9, and 11) ( fig. 5B and C). Given the similar length of probes 1, 2, 1 þ 2, 5, 7, 9, and 11 (24-30 bp) (supplementary table S3, Supplementary Material online), it is possible that this difference in migration of ZmbHLH-probe complexes results from the binding of bHLH to G-boxes in a lower oligomeric state (supplementary fig. S2, Supplementary Material online), which based on the literature must be dimers (De Masi et al. 2011). Strong binding of cis-element pairs was also observed when the ancestral G-box evolved into either FBS or FeRE1 elements found in C 4 Z. mays and S. bicolor (probes 1 þ 2 and 3 þ 4) ( fig. 5B and C). In the C 4 Z. mays promoter, both ZmbHLHs showed binding affinity for single FBS cis-elements Since ZmbHLH128 and ZmbHLH129 showed weak binding to single cis-elements, we tested their binding by mutating these cis-elements in probes with the pairs (supplementary fig. S3, Supplementary Material online). For each pair, three mutant probes were designed: two in which the two ciselements were mutated individually (keeping one cis-element wild-type) and one in which both cis-elements were mutated simultaneously (supplementary table S3 Given the binding affinity in vitro of ZmbHLH128 and ZmbHLH129 to the G-box in the ZmnonC 4 -NADP-ME promoter (probes 7 and 7 þ 8, fig. 5C), we tested their binding ability in planta. Transient expression assays were performed in leaves of N. benthamiana co-infiltrated with GUS reporter gene driven by a ZmnonC 4 -NADP-ME promoter fragment containing the cis-element pair G-and N-box-like (À368 to À143 bp) and the effector constructs ZmbHLH128 and ZmbHLH129 (supplementary fig. S4A, Supplementary Material online). Compared with the reporter alone, coinfiltration of ZmnonC 4 -NADP-ME reporter and the ZmbHLH128 and ZmbHLH129 effectors did not impact on GUS activity in tobacco system (supplementary fig. S4B-D, Supplementary Material online). These results suggest that although ZmbHLH128 on its own binds both the ZmC 4 -NADP-ME and ZmnonC 4 -NADP-ME promoters in vitro (probes 1, 2, 1 þ 2, 7, and 7 þ 8, fig. 5B and C), this might not be the case in planta (supplementary fig. S4, Supplementary Material online).
Acquisition of N-Box-Derived cis-Elements in NADP-ME Promoters Facilitates ZmbHLH128 and ZmbHLH129 Binding in PACMAD Panicoid Grasses Phylogenetic analysis of the genes encoding C 3 and C 4 plastidic NADP-MEs reflects previously reported grass species phylogeny ( fig. 6A) (Grass Phylogeny Working Group II 2012). It inferred two main clades: one formed by C 3 BEP species (B. distachyon and O. sativa) and a second formed by C 3 (D. oligosanthes) and C 4 Panicoid species of the PACMAD clade (S. italica, S. bicolor, and Z. mays) (fig. 6A).
Based on the observed nucleotide modifications in cis-elements recognized by bHLH TFs, we propose a model relating to the recruitment of NADP-ME into C 4 photosynthesis in grasses ( fig. 6B). This proposes that an ancestral G-box found in the NADP-ME promoter of the common ancestor of C 3 BEP O. sativa and C 4 Panicoid grasses has been conserved during the evolution of C 4 photosynthesis. However, in the Panicoideae subfamily of the PACMAD clade a second cis-element recognized by bHLH is present such that the NADP-ME gene from the C 3 species D. oligosanthes and genes encoding plastidic nonC 4 -NADP-ME from C 4 S. bicolor and Z. mays all contain a G-and N-box/N-box-like pair. In C 4 S. italica this cis-code has been retained in the C 4 -NADP-ME, but in S. bicolor and Z. mays the original G-box has evolved to become either a FeRE1 or a FBS element, respectively ( fig. 6B). No G-box motifs are, however, present in the promoter of genes encoding cytosolic NADP-ME from S. bicolor and Z. mays. Overall, these results suggest that the acquisition of N-box-derived cis-elements have facilitated ZmbHLH128 and ZmbHLH129 binding to promoters of genes encoding plastidic NADP-ME in the PACMAD (Panicoideae subfamily).

Discussion
ZmbHLH128 and ZmbHLH129 Homeologs Interact with Maize C 4 -and nonC 4 -NADP-ME Promoters in vitro Showing Different trans-Activation Activity in planta In this study, we showed that ZmbHLH128 and ZmbHLH129 form a maize homeolog pair resulting from the recent maize whole genome duplication (WGD) event that occurred 5-12 million years ago. This WGD occurred 5-16 million years after C 4 photosynthesis evolved in the Andropogoneae tribe of the PACMAD clade (17-21 MYA) (Christin et al. 2008(Christin et al. , 2009. As the length of exons 1 and 2 and the total number of amino acids in the mature protein of ZmbHLH128 are more similar to sorghum ortholog SbbHLH66 (supplementary fig. S5, Supplementary Material online), we propose that ZmbHLH129 has diverged more from the ancestral gene. Both of these TFs bind two FBS cis-elements that are in close proximity in the maize C 4 -NADP-ME (GRMZM2G085019) promoter. Although ZmbHLH128 has been predicted in silico to regulate C 4 photosynthesis , as far as we are aware, this is the first report of its functional characterization. ZmbHLH128 alone activates ZmC 4 -NADP-ME gene expression, while ZmbHLH129 alone shows no trans-activation activity on this promoter. As the duplication event that generated ZmbHLH129 took place after the evolution of C 4 Synergistic Binding of bHLH Transcription Factors . doi:10.1093/molbev/msy060 MBE photosynthesis, it seems possible that this gene is not required for C 4 photosynthesis. ZmbHLH128 and ZmbHLH129 form heterodimers and despite ZmbHLH128 activating the expression of ZmC 4 -NADP-ME its regulatory activity is impaired by its homeolog ZmbHLH129. To explain this impairment, we hypothesize different scenarios that may occur in vivo: either ZmbHLH128 and ZmbHLH129 act as heterodimers and ZmbHLH128 loses its DNA binding activity when combined with ZmbHLH129 or they act as homodimers and compete directly for the same FBSs, toward which ZmbHLH129 has a higher binding affinity. The former scenario has been described for bZIP TFs from Arabidopsis, where bZIP63 has negative effects on the formation of bZIP1-DNA complexes probably due to conformational differences between bZIP1 homodimer and bZIP1-bZIP63 heterodimers (Kang et al. 2010). The latter scenario has been reported for the maize Dof1 and Dof2 TFs. Dof1 is a transcriptional activator of light-regulated genes in leaves, however, in stems and roots, this TF is not able to regulate those genes since the repressor Dof2 is expressed there and blocks Dof-specific cis-elements (Yanagisawa and Sheen 1998).
In addition to the capacity of ZmbHLH128 and ZmbHLH129 to interact with FBSs found in the maize C 4 -NADP-ME promoter, both ZmbHLHs were shown to bind in vitro to the promoter of maize nonC 4 -NADP-ME (GRMZM2G122479) that possesses the cis-element pair Gand N-box-like. In planta, however, ZmbHLH128 and ZmbHLH129 showed no trans-activation activity on this promoter. It is well known that primary DNA sequence and its FIG. 6. Acquisition of N-box-derived cis-elements in NADP-ME promoters facilitates ZmbHLH128 and ZmbHLH129 binding in PACMAD Panicoid grasses. (A) Phylogenetic tree of genes encoding plastidic NADP-ME from C 3 and C 4 grass species. C 3 : B. distachyon (Bd), O. sativa (Os), and D. oligosanthes (Do); C 4 : S. italica (Si), S. bicolor (Sb) and Z. mays (Zm). NADP-MEs are color-coded: magenta for C 3 , blue for nonC 4 and green for C 4 . NADP-ME genomic sequences were aligned using MUSCLE, and the phylogenetic tree inferred by NJ method (1000 bootstrap pseudoreplicates, node numbers indicate bootstrap values). Gene encoding C 3 plastidic NADP-ME from A. thaliana (AtC 3 -NADP-ME) was used as outgroup. (B) Diagram representing C 3 to C 4 molecular evolution of homologous bHLH binding cis-elements identified in promoters of genes encoding plastidic NADP-ME. Dashed arrow indicates intermediate evolutionary steps from C 3 to C 4 . Vertical lines indicate two independent C 4 origins of S. italica and S. bicolor/Z. mays (Paniceae and Andropogoneae tribes, respectively). Borba et al. . doi:10.1093/molbev/msy060 MBE structural properties are determinants of DNA binding specificity in vivo (Rohs et al. 2009) and so it is possible that both ZmbHLHs display increased in vivo binding specificity for the FBS pair in the ZmC 4 -NADP-ME promoter than for the G-and N-box-like pair in the ZmnonC 4 -NADP-ME promoter. Therefore, ZmbHLH128 seems to affect the level of expression of NADP-ME as it activates the ZmC 4 -NADP-ME promoter through the pair formed by two FBSs but the same trend was not observed for the ZmnonC 4 -NADP-ME promoter with the G-and N-box pair. In addition, we hypothesize that these modifications of promoter sequences may also affect light/ circadian regulation of the ZmC 4 -NADP-ME gene as FBS ciselements have been described in promoters of circadianclock-regulated and light-responsive genes (Lin et al. 2007(Lin et al. , 2011Kim et al. 2016). The mutation of two close FBSs in the promoter of the circadian-clock gene EARLY FLOWERING 4 (ELF4) proved to be sufficient to abolish its rhythmic expression (Li et al. 2011). More broadly, our findings also contribute to the understanding of gene regulatory networks controlling C 4 photosynthesis.

The G-Box-Based cis-Element Pair Present in NADP-ME Promoters Synergistically Bind Either ZmbHLH128 or ZmbHLH129
We identified a cis-element pair recognized by bHLH that occupy homologous positions in NADP-ME promoters from C 3 and C 4 grasses. These cis-elements flank a short spacer and operate synergistically to facilitate interaction with ZmbHLH128 and ZmbHLH129. We suggest a mechanism by which these TFs may be recruited to the cis-elements associated with C 4 photosynthesis. We propose that one cis-element is sufficient to recruit a bHLH homodimer (Gbox) or tetramer (N-box or FBS in promoters where the ancestral G-box is no longer present); however, the presence of a second cis-element in the vicinity increases bHLH binding affinity (supplementary fig. S2, Supplementary Material online). It is possible that both cis-elements are brought together through the interaction with a bHLH tetramer formed by two dimers, which may involve DNA bending (supplementary fig.  S2, Supplementary Material online). Therefore, this cis-element pair could operate synergistically to confer stabilization of bHLH binding. This mechanism of TF-DNA assembly has previously been proposed for MADS-domain TFs that can bind two nearby CArG boxes through DNA looping and formation of tetrameric complexes (Theissen 2001;Theissen and Saedler 2001;Melzer et al. 2009;Smaczniak et al. 2012;Smaczniak et al. 2017). In this case, and consistent with our results, MADS-domain TFs were found to bind single CArG boxes either as dimers or tetramers, however, when their target gene promoters contain CArG box pairs they bind as tetramers (Smaczniak et al. 2012). It has been proposed that the probability of DNA loop formation increases with shorter distances between cis-elements due to the low elastic bending energy required to bring the protein dimers together (Agrawal et al. 2008). Interestingly, in all NADP-ME promoters assessed in this study except rice and Brachypodium the two cis-elements were found to be in close proximity, which may encourage DNA looping. In addition to the spacer length, its sequence appears highly conserved. This is consistent with evidence suggesting that nucleotides outside core cis-elements affect TF binding specificity by providing genomic context and influencing three-dimensional structure (Atchley et al. 1999;Mart ınez-Garcia et al. 2000;Grove et al. 2009;Gordân et al. 2013). For example, Cbf1 and Tye7 are yeast bHLHs that show preference for a subset of G-boxes present throughout the yeast genome (Gordân et al. 2013). These differences in binding preferences were observed not just in vivo but also in vitro and so DNA sequences flanking core G-boxes were found to explain this differential bHLH-G-box binding (Gordân et al. 2013).
The mechanism proposed here for how bHLH TFs interact with their target cis-elements suggests that these DNA sequences are not randomly arranged in gene promoters and may affect how cis-element specificity is achieved. Indeed, in some promoters bound by bHLH TFs two or more cis-elements were found to be clustered. For example, two overlapping FBSs were reported in the 400 bp upstream of the translational start site of the gene encoding ELF4 (Li et al. 2011). Also, pairs of G-and N-boxes were found to be highly enriched in promoters targeted by the bHLH PIF1 (Kim et al. 2016). It is possible that multiple cis-elements serve to recruit additional TFs for in vivo cooperative binding.

C 4 Photosynthesis Co-Opted an Ancient C 3 Cis-Regulatory Code Built on G-Box Recognition by bHLH Transcription Factors
Finally, from this study we propose a model that summarizes how molecular evolution of cis-elements recognized by bHLHs may relate to the recruitment of NADP-ME into C 4 photosynthesis. C 4 photosynthesis is an excellent example of convergent evolution Christin et al. 2013) as it has evolved independently over 60 times in angiosperms Sage 2016) and at least 22 times in grasses (Grass Phylogeny Working Group II 2012). How this repeated evolution has come about is not fully understood. Our model contributes to our understanding of C 4 evolution and is based on the following findings: first, in rice, which belongs to the BEP clade that contains no C 4 species, only one copy of a Gbox was present in the NADP-ME promoter. In contrast, ciselement pairs recognized by ZmbHLH128 and ZmbHLH129 in NADP-ME promoters seem to be common in the Panicoideae subfamily of the PACMAD clade that contains independent C 4 lineages. For example, in the PACMAD Panicoid grasses a G-and N-box pair was identified in C 3 D. oligosanthes (Do024386) and appears to be reasonably conserved in C 4 species. However, in the case of the C 4 -NADP-MEs from S. bicolor and Z. mays (Sobic.003g036200 and GRMZM2G085019) these elements have diversified. Both of these grass species belong to the C 4 tribe Andropogoneae in which the plastidic NADP-ME isoform that is used in C 4 photosynthesis (C 4 -NADP-ME) evolved by duplication from an ancestral plastidic NADP-ME that still exists and is not involved in the C 4 cycle (nonC 4 -NADP-ME, Sobic.009g108700 and GRMZM2G122479) (Tausta et al. 2002;Maier et al. 2011; Synergistic Binding of bHLH Transcription Factors . doi:10.1093/molbev/msy060 MBE Alvarez et al. 2013). In contrast, C 4 S. italica together with C 3 D. oligosanthes belong to the grass tribe Paniceae in which only one plastidic NADP-ME isoform is known to exist (Si000645 and Do024386) (Alvarez et al. 2013;Emms et al. 2016). Surprisingly, the cis-element pair identified in the C 4 -NADP-ME promoter from S. italica (G-and N-box) was found to be closer to those occurring in the C 3 and nonC 4 -NADP-ME promoters from D. oligosanthes, S. bicolor, and Z. mays (Gand N-box/N-box-like) than to those occurring in the C 4 -NADP-ME promoters from S. bicolor and Z. mays (FeRE1 and N-box or FBS and FBS, respectively). A similar trend has previously been observed (Alvarez et al. 2013) and may be explained by the independent evolutionary origin of C 4 photosynthesis in grass tribes formed by S. italica (Paniceae) or S. bicolor/Z. mays (Andropogoneae).
Taken together, our findings suggest that an ancestral Gbox in combination with N-box-derived cis-elements form the basis of the synergistic binding of either ZmbHLH128 or ZmbHLH129 to NADP-ME promoters from PACMAD Panicoid grasses. Nucleotide diversity in cis-elements recognized by bHLH TFs has been suggested as one of the mechanisms by which these TFs are involved in complex and diverse transcriptional activity (Toledo-Ortiz et al. 2003). We, therefore, cannot exclude the possibility that the gene encoding the plastidic NADP-ME from C 3 BEP B. distachyon (BRADI2g05620) can also be bound by ZmbHLH128 or ZmbHLH129 despite none of the typical cis-elements recognized by bHLH being identified in the promoter. Given recent evidence indicating that the bHLH TF family is often recruited into C 4 photosynthesis regulation (Huang and Brutnell 2016), we suggest that the observed nucleotide modifications in the cis-element pair present in C 4 -NADP-ME promoters from S. bicolor and Z. mays may underlie changes in bHLH binding specificity in vivo and, therefore, contribute to the NADP-ME recruitment into C 4 photosynthesis in the Andropogoneae tribe from the PACMAD clade. The presence of a bHLH duplicate (ZmbHLH129) that seems not to be required for C 4 photosynthesis and has evolved to repress the activity of its homeolog (ZmbHLH128) is unique to maize as this homeolog gene pair resulted from the maize WGD. Therefore, we hypothesize that the single orthologous bHLH in all the other Panicoid species of the PACMAD clade activates C 4 -NADP-ME gene expression. This agrees with the hypothesis that C 4 photosynthesis has on multiple occasions made use of cisregulators found in C 3 species and, therefore, that the recruitment of C 4 genes was made through minor rewiring of preexisting regulatory networks (Reyna-Llorens and Hibberd 2017). We conclude that regulation of C 4 genes can be based on an ancient code founded on a G-box present in the BEP clade as well as the Panicoideae of the PACMAD clade. Acquisition of a second cis-element recognized by bHLH in Panicoid grasses appears to have facilitated synergistic binding by either ZmbHLH128 or ZmbHLH129. Although this G-boxbased cis-code has remained similar in S. italica, it has diverged in maize and sorghum. Thus, different C 4 grass lineages may employ slightly different molecular circuits to regulate orthologous C 4 photosynthesis genes.

Plant Growth Conditions and Collection of Leaf Samples
To construct the cDNA expression library, maize plants (Z. mays L. var. B73) were grown at 16 h photoperiod with a light intensity of 340-350 lmol m À2 s À1 , at day/night temperature of 28/26 C, and 70% relative humidity. Two light regimes were used: (1) nine days in 16 h photoperiod; and (2) nine days in 16 h photoperiod followed by a 72 h dark treatment. In both experiments, sample collection was performed under 16 h photoperiod. Third leaves grown in the former and latter light regimes were harvested respectively at time points covering the Zeitgeber times (ZT) À0.5, 0.5, 2 h, and ZT 1, 2, 4, 8, 12, 15.5 h. For isolation of maize mesophyll protoplasts, maize plants were grown for 10 days at 25 C, 16 h photoperiod (60 lmol m À2 s À1 ), and 70% relative humidity. For transient expression assays in planta, N. benthamiana (tobacco) plants were grown for 5 weeks at 22 C, 16 h photoperiod (350 lmol m À2 s À1 ), and 65% relative humidity. After agro-infiltration of tobacco leaves, plants were left to grow into the same growth conditions and leaf discs (2.5 cm in diameter) collected 96 h postinfection.

Construction of cDNA Expression Library
Total RNA was extracted from third leaves of maize seedlings using TRIzol reagent (Invitrogen), following the manufacturer's instructions. RNA samples from nine time points (described in 'plant growth conditions and collection of leaf samples') were pooled in equal amounts for mRNA purification using the PolyATract mRNA Isolation System IV (Promega). A unidirectional cDNA expression library was prepared using the HybriZAP-2.1 XR cDNA Synthesis Kit and the HybriZAP-2.1 XR Library Construction Kit (Stratagene), following the manufacturer's instructions. Four micrograms of mRNA were used for first strand cDNA synthesis. After in vivo excision and amplification of the pAD-GAL4-2.1 phagemid vector, this maize cDNA expression library was used to transform yeast bait strains. Borba et al. . doi:10.1093/molbev/msy060 MBE Yeast One-Hybrid (Y1H) Screening and Validation Yeast bait strains were transformed with 1 lg of maize cDNA expression library according to Ouwerkerk and Meijer (2001) and Serra et al. (2013). At least, 1.3 million yeast colonies of each yeast bait strain transformed with the maize cDNA expression library were screened in CM -HIS -LEU supplemented with 3-AT: 5 mM (À1982 to À1524 bp), 20 mM (À389 to À154 bp, À776 to À334 bp) or 75 mM (À973 to À702 bp, À1225 to À891 bp, À1617 to À1135 bp). Plasmids from yeast clones that actively grew on selective medium were extracted. To know whether the isolated clones encoded transcription factors (TFs), the cDNA insert was sequenced and the results analyzed using BLAST programes. To validate DNA-TF interactions in yeast, isolated plasmids encoding TFs were re-transformed into the yeast bait strain in which they were found to bind. To assess TF binding specificity, plasmids encoding TFs were also transformed into the yeast bait strains to which they do not bind.

Yeast Cell Spotting
Yeast bait strains transformed with plasmids encoding TFs were grown overnight until log or mid-log phase at 30 C in liquid yeast CM medium supplemented with Histidine (CM þHIS -LEU). Cultures were normalized to an OD 600 of 0.4, spotted onto solid medium CM þHIS -LEU or CM -HIS -LEU þ 3-AT, and grown for 3 days at 30 C.

Isolation and Transformation of Maize Mesophyll Protoplasts
Maize mesophyll protoplasts were isolated from 10-day-old maize greening plants and transformed according to Lourenço et al. (2013) with minor modifications. Midsection of newly matured second leaves was digested in a cell wall digestive medium containing 1.5% (w/v) cellulase R-10 (Duchefa), 0.3% (w/v) macerozyme R-10 (Duchefa), 10 mM MES (pH 5.7), 0.4 M mannitol, 1 mM CaCl 2 , 0.1% (w/v) BSA and 5 mM b-mercaptoethanol. Several leaf blades were stacked and cut perpendicularly to the long axis into 0.5-1 mm slices and quickly transferred to digestive medium (25 ml digestive medium for each set of 10 leaf blades). Purity and integrity of isolated protoplasts were examined under light microscopy. Mesophyll protoplasts were quantified and its abundance adjusted to 2 Â 10 6 protoplasts ml À1 . Transformed protoplasts were resuspended in 1.25 ml of incubation solution [0.6 M mannitol, 4 mM MES (pH 5.7) and 4 mM KCl] and incubated in 24-well plates for 18 h at room temperature under dark.

Transient Expression Assays in planta
For the transient expression assays in tobacco leaves, reporter and effector constructs were generated in the Gateway binary vectors pGWB3i [pGWB3 containing an intron-tagged b-glucuronidase (GUS) open reading frame (Berger et al. 2007)] and pGWB2 (Tanaka et al. 2012), respectively.
To construct the reporter plasmids, promoter fragments of ZmC 4 -NADP-ME (GRMZM2G085019, from À389 to À154 bp) and ZmnonC 4 -NADP-ME (GRMZM2G122479, from À368 to À143 bp) were fused to a 136 bp minimal CaMV35S promoter (m35S) in a 3-step PCR reaction: (1) promoter sequences were amplified with long chimeric primers to introduce overlapping ends (reverse primer of pZmC 4 -NADP-ME/pZmnonC 4 -NADP-ME was designed to be complementary to the forward primer of the m35S) (supplementary table S4, Supplementary Material online); (2) promoter sequences amplified by PCR in (1) were mixed according to the fusion products of interest in a ratio of 1:1 [ZmC 4 -NADP-ME (À389 to À154 bp)::m35S and ZmnonC 4 -NADP-ME (À368 to À143 bp)::m35S] and 10 PCR cycles were run without primers (denaturation at 98 C for 10 s, 55 C for 30 s, and 72 C for 1 min); and (3) fusion products of interest were amplified with attB-containing primers (supplementary table S4, Supplementary Material online). To obtain Entry clones, promoter fragments fused to m35S were cloned into pDONR221 (Invitrogen) through BP-Gateway reaction (Invitrogen), following the manufacturer's instructions. Promoter sequences were then recombined into the binary vector pGWB3i through LR-Gateway reaction (Invitrogen) to obtain the final reporter constructs for promoter::GUS analysis (pZmC 4 -NADP-ME and pZmnonC 4 -NADP-ME). For the effector constructs (TF driven by the CaMV35S promoter), ZmbHLH128 and ZmbHLH129 Entry clones previously generated (see BiFC assay) were directly recombined into the binary vector pGWB2 through LR-Gateway reaction (Invitrogen).
Synergistic Binding of bHLH Transcription Factors . doi:10.1093/molbev/msy060 MBE Reporter and effector constructs together with a construct harboring the silencing suppressor P1b (Valli et al. 2006) were transformed into the Agrobacterium tumefaciens strain GV301. Overnight cultures of Agrobacterium harboring reporter, effector and P1b constructs were sedimented (5000 Â g for 15 min, at 4 C) and resuspended in infiltration medium (10 mM MgCl 2 , 10 mM MES (pH 5.6), 200 lM acetosyringone) to an OD 600 of 0.3, 1, and 0.5, respectively, and mixed in a ratio of 1:1:1. Mixed Agrobacterium cultures were incubated for 2 h at 28 C and used to spot-infiltrate the abaxial side of 5-week-old tobacco leaves. As controls, tobacco leaves were agro-infiltrated with mixed cultures carrying the reporter construct alone or the empty vector pGWB3i and effector constructs. Infected leaves were analyzed at 96 h post-infiltration. Leaf discs of 2.5 cm in diameter were collected from the infiltrated spots and used for the quantification of GUS activity. GUS activity was quantified by measuring the rate of 4-methylumbelliferyl-b-D-glucuronide (MUG) conversion to 4-methylumbelliferone (MU) as described in Jefferson et al. (1987) and Williams et al. (2016). In brief, soluble protein was extracted from agro-infiltrated tobacco leaf discs by freezing in liquid nitrogen and maceration, followed by addition of protein extraction buffer. Diluted protein extracts (1:2) were incubated with 1 mM MUG for 30, 60, 90, and 120 min at 37 C in a 96-well plate. GUS activity was terminated at the end of each time point by the addition of 200 mM Na 2 CO 3 and MU fluorescence measured by exciting at 365 nm and measuring emission at 455 nm. The concentration of MU/unit fluorescence in each sample was interpolated using a concentration gradient of MU from 1.5 to 800 lM MU.
Blue Native-Polyacrylamide Gel Electrophoresis (BN-PAGE) and Western Blotting Molecular mass of oligomers co-existing in purified ZmbHLH128 and ZmbHLH129 recombinant proteins was determined by blue native polyacrylamide gel electrophoresis (BN-PAGE). Two micrograms of the recombinant proteins (Trx::His::ZmbHLH128 or Trx::His::ZmbHLH129) were resolved on a 3-12% Novex Bis-Tris NativePAGE mini gel (Life Technologies), following the manufacturer's instructions. HMW Native Marker Kit (66-669 kDa, GE Healthcare) was used to estimate molecular mass. Resolved proteins were transferred to a polyvinylidene difluoride membrane (GE Healthcare). The membrane was destained with a 50% (v/v) methanol and 10% (v/v) acid acetic solution followed by pure methanol. For immunodetection of Trx::His::ZmbHLH128 and Trx::His::ZmbHLH129, the membrane was incubated with a-His antibody (GE Healthcare) followed by a-mouse horseradish peroxidase-conjugated antibody (abcam) for 1 h each at room temperature.
Electrophoretic Mobility Shift Assay (EMSA) DNA probes were generated by annealing oligonucleotide pairs in a thermocycler followed by radiolabeling as described in Serra et al. (2013). DNA probe sequences and respective annealing temperatures are listed in supplementary table S3, Supplementary Material online. EMSAs were performed using 400 ng of the recombinant proteins Trx::ZmbHLH128 or Trx::ZmbHLH129, and 50 fmol of radiolabeled probes. Competition assays were performed adding 200-to 400fold molar excess of the unlabeled probe. Trx::OsPIF14 (LOC_Os07g05010) and Trx protein, both purified by Cordeiro et al. (2016), were used as negative controls. Each protein was mixed with probes in a 10 ll reaction containing 10 mM HEPES (pH 7.9), 40 mM KCl, 1 mM EDTA (pH 8), 1 mM DTT, 50 ng herring sperm DNA, 15 lg BSA and 10% (v/v) glycerol. Binding reactions were incubated for 1 h on ice and the bound complexes resolved on a native 5% polyacrylamide gel (37.5:1). Gel electrophoresis and detection of radioactive signal were performed as described in Serra et al. (2013).

Phylogenetic Analyses
ZmbHLH128 and ZmbHLH129 were used as references to identify closely related bHLH genes of Z. mays, S. bicolor, Setaria viridis, S. italica, O. sativa, and B. distachyon, through Phytozome database (Goodstein et al. 2012). Predicted CDS were aligned using MUSCLE. The resulting alignment was used to infer a maximum likelihood phylogenetic tree, using GTR þ GþI nucleotide substitution model (1,000 bootstrap pseudoreplicates) in MEGA 7 software (Kumar et al. 2016).

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.