-
PDF
- Split View
-
Views
-
Cite
Cite
Zijie Shen, Yuan Lin, Quan Zou, Transcription factors–DNA interactions in rice: identification and verification, Briefings in Bioinformatics, Volume 21, Issue 3, May 2020, Pages 946–956, https://doi.org/10.1093/bib/bbz045
- Share Icon Share
Abstract
The completion of the rice genome sequence paved the way for rice functional genomics research. Additionally, the functional characterization of transcription factors is currently a popular and crucial objective among researchers. Transcription factors are one of the groups of proteins that bind to either enhancer or promoter regions of genes to regulate expression. On the basis of several typical examples of transcription factor analyses, we herein summarize selected research strategies and methods and introduce their advantages and disadvantages. This review may provide some theoretical and technical guidelines for future investigations of transcription factors, which may be helpful to develop new rice varieties with ideal traits.
Introduction
Rice is one of the most important crops worldwide, with more than half of the global population relying on it as a food source. To promote rice development, a high-quality fully sequenced rice genome is imperative. Therefore, the International Rice Genome Sequencing Project was launched in 1998 [1, 2]. The sequenced rice genome and the development of sequencing technologies ushered in a new and exciting chapter regarding rice functional genomics research [3, 4], which aims to clarify the functions and structures of large numbers of genes to eventually functionally characterize the whole rice genome [5]. The proteins encoded by genes, as well as the associated regulatory networks, collectively determine specific agronomic traits. Additionally, genes may be assembled and integrated to breed rice varieties with ideal traits, including high grain yield and quality and resistance to multiple abiotic and biotic stresses. Rice, which is an excellent model monocotyledonous plant species, shares a close syntenic and collinear relationship with other crops, such as maize and wheat. Therefore, rice functional genomics research may also provide lots of support for genetic analyses of other crops.
The identification of transcription factors (TFs) is a crucial aspect of functional genomics research [6]. TFs are vital for controlling gene expression [7]. Moreover, TFs are involved in the regulation of various biological processes in rice, such as development, hormone signal transduction, metabolism and responses to various stresses [8–13]. Interactions between proteins and DNA influence almost all aspects of cellular function [14, 15]. In this regard, several computational predictors have been proposed to predict the DNA-binding proteins only based on the sequence information [16–19]. Furthermore, some predictors have been proposed to detect the homology relationship among proteins sharing similar structures and functions [20–22]. The Database of Rice Transcription Factors contains 2025 putative TFs in Oryza sativa L. ssp. indica and 2384 putative TFs in O. sativa L. ssp. japonica [23], all of which are distributed in 63 families. TFs are frequently classified based on their DNA-binding domains. For example, the WRKY family is named according to the first four amino acids of the most highly conserved DNA-binding domain [24]. Additionally, MYB proteins consist of a highly conserved MYB domain [25], whereas the basic leucine zipper (bZIP) TFs harbor a conserved bZIP domain [26]. In rice, most of the studied TFs belong to the bZIP, MYB/MYC, WRKY, AP2/ERF and NAC families [27–35]. Examples include OsFD1, which is a bZIP TF that is closely related to flowering [28], and OsMYB2P-1, which affects abiotic stress responses and root architecture [29]. The SHALLOT-LIKE1 (SLL1) gene encodes an MYB TF that regulates leaf abaxial cell development [30]. Salicylic acid (SA) plays an important role in rice basic defense [36]. Rice WRKY45 is a key factor of the branched SA pathway and regulates rice resistance to rice blast disease and bacterial leaf blight disease, which are major rice diseases [31, 32]. The SHAT1 gene is reportedly expressed as part of the regulatory pathways related to seed shattering [33]. Several members of the AP2/ERF family play important regulatory roles that enable rice plants to adapt to abiotic stresses [34]. Moreover, OsNAC10 improves rice tolerance to environmental stresses and increases grain yield under drought conditions [35].
Database . | Spieces . | redundancy . | Free . | Sites . | References . |
---|---|---|---|---|---|
TRANSFAC | Eukaryotic | Y | N | http://gene-regulation.com/ | [51] |
JASPAR | Eukaryotic | N | Y | http://jaspar.binf.ku.dk/ | [56] |
PlantTFDB | Plant | N | Y | http://planttfdb.cbi.pku.edu.cn/ | [41] |
PlantProm | Plant | N | Y | http://softberry.com | [58] |
PlantCARE | Plant | U | Y | http://bioinformatics.psb.ugent.be/webtools/plantcare/html/ | [49] |
Place | Plant | U | Y | http://www.dna.affrc.go.jp/htdocs/PLACE/ | [59] |
AGRIS | Arabidopsis thaliana | U | Y | https://agris-knowledgebase.org/ | [60] |
Database . | Spieces . | redundancy . | Free . | Sites . | References . |
---|---|---|---|---|---|
TRANSFAC | Eukaryotic | Y | N | http://gene-regulation.com/ | [51] |
JASPAR | Eukaryotic | N | Y | http://jaspar.binf.ku.dk/ | [56] |
PlantTFDB | Plant | N | Y | http://planttfdb.cbi.pku.edu.cn/ | [41] |
PlantProm | Plant | N | Y | http://softberry.com | [58] |
PlantCARE | Plant | U | Y | http://bioinformatics.psb.ugent.be/webtools/plantcare/html/ | [49] |
Place | Plant | U | Y | http://www.dna.affrc.go.jp/htdocs/PLACE/ | [59] |
AGRIS | Arabidopsis thaliana | U | Y | https://agris-knowledgebase.org/ | [60] |
Y: Yes, N: No , U: Unkown.
Database . | Spieces . | redundancy . | Free . | Sites . | References . |
---|---|---|---|---|---|
TRANSFAC | Eukaryotic | Y | N | http://gene-regulation.com/ | [51] |
JASPAR | Eukaryotic | N | Y | http://jaspar.binf.ku.dk/ | [56] |
PlantTFDB | Plant | N | Y | http://planttfdb.cbi.pku.edu.cn/ | [41] |
PlantProm | Plant | N | Y | http://softberry.com | [58] |
PlantCARE | Plant | U | Y | http://bioinformatics.psb.ugent.be/webtools/plantcare/html/ | [49] |
Place | Plant | U | Y | http://www.dna.affrc.go.jp/htdocs/PLACE/ | [59] |
AGRIS | Arabidopsis thaliana | U | Y | https://agris-knowledgebase.org/ | [60] |
Database . | Spieces . | redundancy . | Free . | Sites . | References . |
---|---|---|---|---|---|
TRANSFAC | Eukaryotic | Y | N | http://gene-regulation.com/ | [51] |
JASPAR | Eukaryotic | N | Y | http://jaspar.binf.ku.dk/ | [56] |
PlantTFDB | Plant | N | Y | http://planttfdb.cbi.pku.edu.cn/ | [41] |
PlantProm | Plant | N | Y | http://softberry.com | [58] |
PlantCARE | Plant | U | Y | http://bioinformatics.psb.ugent.be/webtools/plantcare/html/ | [49] |
Place | Plant | U | Y | http://www.dna.affrc.go.jp/htdocs/PLACE/ | [59] |
AGRIS | Arabidopsis thaliana | U | Y | https://agris-knowledgebase.org/ | [60] |
Y: Yes, N: No , U: Unkown.
There is currently an abundance of bioinformatics databases and tools useful for functionally annotating TFs and predicting TF binding sites (TFBSs) in rice [23, 37–41]. The public databases provide large amounts of important information relevant for studying rice TFs. However, the TFBSs predicted by the available tools may be incorrect. Consequently, researchers must either validate the accuracy of the predicted binding sites or identify the real TFBSs with molecular biology techniques. Additionally, proving how TFs regulate the downstream target genes can be difficult, with researchers needing to select suitable strategies and methods to generate the necessary evidence.
The classical view of interactions between TFs and methylated DNA is that DNA methylation inhibited TFs from binding target elements [42]. However, emerging evidence showed that some TFs bind to methylated DNA motifs and possess sequence-dependent mCpG-binding activity [43–45]. To the best of our knowledge, there are no reports of methylated DNA–TFs interactions in rice.
In this review article, we summarize some of the typical and effective techniques for testing protein–DNA interactions, including methylated DNA–TFs interactions, which may be useful for future studies involving rice TFs.
Research technology
The increasing number of experimentally characterized TFBSs prompted some databases to start collecting TFBS-related information (Table 1) [46–49]. Examples include TRANSFAC and JASPAR, which are two major TF databases [50]. In TRANSFAC, the experimentally derived TFBSs are annotated with the associated experimental techniques and conditions, and the credibility of the detected sites has been evaluated [51]. Moreover, TRANSFAC contains the most data as well as the most comprehensive related information. However, there are some drawbacks to this database, including information redundancy and the considerable variability in model quality. Furthermore, TRANSFAC only provides some data to nonprofit organizations. In contrast, JASPAR is the most comprehensive and publicly available database of matrix-based TF-binding profiles. These profiles were exclusively derived from published experimentally defined sites and are stored as position frequency matrices [52–57].
Both major TF-related databases enable researchers to predict the DNA-binding motifs of TFs. The TFBSs are conserved, but are not identical, instead exhibiting a specific pattern (i.e. motif). With the development of algorithms, the weight matrix is considered to be a general and accurate approach for describing motif characteristics [50, 61]. A position weight matrix (PWM), which is also known as a position-specific weight matrix or position-specific scoring matrix, is an essential and commonly used modern algorithm for detecting motifs. Match™ is a powerful web-based and weight matrix-based tool that is closely interconnected with the PWM library in the TRANSFAC database. The matrix similarity score and the core similarity score are used by Match™ to search for putative TFBSs in the promoter sequence of target genes [37]. The JASPAR database recommends the use of its data sets and the ConSite tool instead of its own tools [38]. ConSite is a web-based and user-friendly tool, which uses high-quality TF and phylogenetic footprinting to predict TFBSs [38]. Phylogenetic footprinting refers to the fact that selective pressure makes functional regulatory regions more conserved than regions with no sequence-specific function in non-coding regions [62, 63]. Accordingly, TFBSs are predicted by searching for conserved motifs of homologous genes in multiple species. Although profile-based model frameworks are able to accurately describe TFBSs, the predicted TFBSs are not necessarily correct. For example, some functional regulatory sites are less conserved than background sequences.
Assay for Transposase Accessible Chromatin using sequencing (ATAC-seq) is a rapid and sensitive method to predict protein-binding motifs occupancy genome-widely [64]. What is known to all is that DNA is inaccessible in eukaryotic cell, because DNA and nucleosomes are packaged into chromatin [65]. When the chromatin changes from condensed state to accessible state, TFs or other DNA-binding proteins are able to access DNA and subsequently regulate gene expression [65]. Transposons are considered inserted preferentially into regions of accessible chromatin. ATAC-seq, based on the feature of hyperactive Tn5 transposase, is able to capture open chromatin sites [64]. Compared with DNase-seq and FAIRE-seq, ATAC-seq is simple and more efficient. The existence of cell walls and the pollution of subcellular organelles affect the application of ATAC-seq in plant. Roger B. Deal isolated nuclei by INTACT (isolation of nuclei tagged in specific cell types) to optimize ATAC-seq [66]. Fluorescence-activated nuclei sorting-assay was also used to optimize ATAC-seq [67]. Chromatin immunoprecipitation sequencing technique (ChIP-seq) is a comprehensive experimental combining biology method with silico analyses. The combination of ATAC-seq and ChIP-seq can be used to predict the binding of the TFs of interest to DNA [64, 68]. However, these new technologies inevitably have certain deviations [69]. Therefore, researchers will need to test the predicted sites with another experiment.
Yeast one-hybrid assay
The yeast one-hybrid (Y1H) assay is an important technique for assessing physical interactions between TFs and their target promoters. The assay is based on the yeast two-hybrid protein–protein interaction assay. The Y1H assay consists of the following two components (Figure 1A): a reporter construct with the DNA element of interest and an expression construct comprising the TF of interest and a yeast transcription activation domain. The reporter construct with the ‘bait’ DNA element is commonly called the ‘bait’, while the expression construct is often referred to as the ‘prey’. A typical TF generally consists of a DNA-binding domain, a trans-activating domain and a signal-sensing domain [7]. In a Y1H system, the DNA-binding domain of a yeast TF (e.g. GAL4) is replaced by a TF lacking the transcription activation function. Therefore, if there are physical interactions between the TF and DNA, the downstream reporter will be activated regardless of whether the TF is an activator or repressor [70].

Cartoon illustration of the mechanism underlying the Y1H assay. (A) Structures of a reporter construct and a prey construct. (B) Yeast cells are transformed with both ‘bait’ and ‘prey’ constructs.
There are several types of Y1H reporter constructs, each of which was designed for distinct experiments. The yeast strains used for Y1H commonly carry mutations in multiple auxotrophic genes. However, trp1 (tryptophan), leu2 (leucine), his3 (histidine) and ura3 (uracil) are generally used. The auxotrophic mutants with these mutated genes are unable to grow in the absence of various essential substances, and the reporter gene is activated via an interaction with a specific TF that rescues the auxotrophic phenotype [71]. The Y1H system reporter gene is HIS3, which enables yeast cells to grow in the absence of exogenously supplied histidine. Additionally, the Y1H reporter construct is often designed to express the gene encoding beta-galactosidase. All of the colonies that are able to grow are then tested in a colorimetric assay. Specifically, the production of beta-galactosidase results in the cleavage of X-gal to produce blue colonies.
As indicated in Figure 1, the reporter construct with the ‘bait’ cis-acting element and the ‘prey’ construct with a TF-lacking activating domain are transferred into yeast cells. The interaction between the DNA-binding domain of the ‘prey’ and the ‘bait’ will activate the expression of the reporter gene. A lack of interaction between the ‘prey’ and ‘bait’ results in no expression.
The Y1H assay can confirm the interactions between proteins and the DNA of interest without requiring purified protein. Additionally, the TF-drived protein retains its natural structure in yeast cells. Moreover, Y1H assays are relatively simple to complete. These advantages have made the Y1H assay a widely used technique to prove protein–DNA interactions. From its initial application to the present, the Y1H assay has been important for verifying the binding of TFs to DNA [72–76].
Dual-luciferase reporter assay system
The luciferase genes have been one of the most generally and widely used reporter genes since they were first used as a reporter of gene expression in rice. The luciferase reporter assay can detect bioluminescence generated by the reaction between luciferase and its substrates. This system has been used to verify interactions between TFs and their target DNA sequences. The expression of the reporter gene is measured to quantify the activity of the TF. The luciferase reporter assay consists of the following two components: a reporter construct with the target promoter and an effector construct with TFs. These components are somewhat similar to those of the Y1H system. Both plasmids are transferred into plant cells, and if the TF activates the target promoter, the expression of the luciferase gene will yield luciferase. The intensity of the fluorescence produced by the reaction between luciferase and its substrate may be used to determine the luciferase activity level, which indicates whether the TF is capable of interacting with the target promoter fragment. However, differences in the transfection efficiency of each sample may result in diverse fluorescence intensities between the experimental group and the control group. Therefore, it may be difficult to judge whether any differences between the experimental and control groups are caused by the activity of the promoter itself or operational errors affecting variables such as transfection efficiency.
Promega introduced the Dual-Luciferase® Reporter (DLR™) assay system, thereby providing researchers with a highly efficient method for applying dual reporters to improve the experimental accuracy. Dual reporters transfer two reporter constructs into a single system. One reporter construct with an experimental reporter gene (e.g. pGL3 family vector) is coupled to a regulated promoter, while the other reporter functions as an internal control, which contains a distinct reporter gene (e.g. pRL family vector), and is coupled to a constitutive promoter [77]. The expression of the pGL3 vector sequence is related to the transcriptional activity of its regulated promoter. The transcriptional activity of the pRL vector sequence is unaffected by the external conditions and is used to normalize and measure the transcriptional activity of the pGL3 vector. It is possible to minimize the effects of several inherent factors that can influence the accuracy of the experiment, including differences among cell plates, cell numbers, cell viability, transfection efficiency and lysis efficiency [77]. Compared with the luciferase reporter assay, the dual-luciferase reporter assay likely decreases the effects of extraneous factors to optimize the experimental accuracy.
Both pGreenII62-SK and pGreenII0800-LUC are commonly coupled to TFs and the sequence of interest, respectively, in rice dual-luciferase reporter assays [78, 79]. The pGreenII0800-LUC vector carries the firefly luciferase (LUC) gene under the control of the promoter of interest as well as the Renilla luciferase (REN) gene under the control of the CaMV 35S promoter [5]. To generate the construct for a dual-luciferase reporter assay, a coding sequence is inserted into the pGreenII62-SK vector as an effector and a promoter sequence is cloned into pGreenII0800-LUC as a reporter. Rice cells are then co-infiltrated with these vectors. If the TF is able to specifically bind to the promoter, the LUC reporter gene will be expressed to produce the encoded firefly luciferase enzyme. The REN reporter gene is simultaneously activated under the control of CaMV 35S promoter. The rice cells are subsequently lysed in passive lysis buffer to release the firefly and Renilla luciferase enzymes. Next, Luciferase Assay Reagent II is added to the cell lysates to activate the firefly luminescence signal. Upon completion of the firefly luciferase reaction, the Renilla luciferase enzyme catalyzes the production of a luminescence signal, while the firefly luminescence signal is rapidly quenched following the addition of the Stop & Glo Reagent to the solution [77]. This is the process in a classical dual-luciferase reporter assay in rice (Figure 2). Additionally, the negative control cells are co-infiltrated with the pGreenII62-SK empty vector and the reporter construct.

Electrophoretic mobility shift assay
The electrophoretic mobility shift assay (EMSA) is a rapid and simple technique that is generally used to detect and study sequence-specific DNA-binding proteins such as TFs in vitro [80, 81]. It is based on the principle that proteins binding to DNA probes form macromolecular complexes, which shift more slowly than free nucleic acids in non-denaturing polyacrylamide gel. The DNA probes can be labeled by covalent or non-covalent fluorophores, radioisotopes or biotin [82–84]. Although diverse DNA sequences may be used in an EMSA, from short oligonucleotides to sequences comprising several thousand nucleotides/base pairs, short nucleic acids may be preferable when the target DNA is well defined [80, 81, 85]. Compared with long nucleic acid sequences, shorter sequences are cheaper and more easily synthesized. Additionally, the associated decrease in the number of non-specific protein-binding sites helps to prevent interference from other proteins. The proteins may be derived from crude nuclear or whole cell extracts or from prokaryotic expression systems. However, the concentration of the protein of interest must be greater than 1 μg/μl.
The EMSA requires at least four reaction mixtures, which are used to prove that a protein can directly and specifically bind to the promoter of a target gene. These mixtures are as follows (Figure 3): a negative control with only a biotin-labeled probe; a conventional control group with a biotin-labeled probe and the protein; a cold-competition control with a biotin-labeled probe, a cold-probe (unlabeled) and the protein; and a mutant-probe cold-competition control with a biotin-labeled probe, a mutant-probe (unlabeled) and the protein. Both the cold-probe and mutant-probe need to be used at considerably higher concentrations than the biotin-probe. If the band corresponding to the biotin-probe is not at the lowest position, there are likely some impurities in the control. If the band position in the lane with the mixture of biotin-labeled probe and protein is higher than the biotin-probe band, this suggests that the protein interacts with the DNA probe. The biotin tag may bind to proteins, thereby resulting in a false-positive mobility shift. Thus, the cold-competition control is used to rule out this possibility. The band position of this control group should be at the same level as the band of the negative control group if the mobility shift is due to the protein specifically binding to the probe. Conversely, cold-probes do not affect the interaction between the biotin-labeled probe and the protein. Consequently, this control is used to indicate that the mobility shift of the conventional control group band is due to biotin. The mutant-probe is unable to affect the interaction between the biotin-labeled probe and the protein. Thus, this control is used to indicate the specific binding between the protein and the probe. All of the results collectively indicate whether TFs can specifically bind to promoters. Additionally, the positive control in commercially available EMSA kits is used to ensure that the assay is conducted correctly under optimal experimental conditions.

Chromatin immunoprecipitation
Chromatin immunoprecipitation (ChIP) is an excellent method for analyzing the interaction between proteins and DNA in vivo [86]. This technique involves immobilizing protein–DNA complexes in living cells with a chemical cross-linking agent (e.g. formaldehyde) and then randomly breaking them into small fragments of chromatin of a certain length with a physicochemical-based method, including sonication (hydrodynamic shearing), enzyme digestion or a micrococcal nuclease treatment [87]. When formaldehyde is used as the cross-linking agent, sonication is usually the preferred option. Finally, the specificity of antibodies for antigens is applied to acquire the protein of interest with a DNA fragment [88]. A quantitative real-time polymerase chain reaction (PCR) is the preferred method for analyzing the ChIP DNA fragment [87]. In addition with the maturity and low cost of next-generation sequencing technology, ChIP-seq has gradually become the most common technique to study these protein–DNA relationships. However, ChIP-seq is unable to provide precise set of binding sites because of non-specific genomic DNA [89, 90]. Lambda exonuclease only degrades naked double-strand DNA in the 5′ to 3′ orientation. Compared with ChIP-seq, ChIP-exo reduces background noise by lambda exonuclease [90]. Whereas the digestion step reduces the amount of recovered DNA from ChIP, which influences the subsequent sequencing. For constructing library, it is required to complete two inefficient ligation steps to add adaptors on DNA paired ends. During the PCR process, low amounts of starting DNA often produce noise data because of over-amplification. A robust ChIP-exo protocol named ChIP experiments with nucleotide resolution through exonuclease, unique barcode and single ligation (ChIP-nexus) overcomes the problem caused by low amounts of DNA [91]. The new technique employs an efficient DNA self-circulation step to add an adaptor on DNA ends. Therefore, ChIP-nexus without more starting DNA is available for subsequent sequencing.
In rice, ChIP experiments have been widely used to investigate the biological interactions between proteins of interest and the promoter of target genes [78, 79, 92–95]. A ChIP experiment generally requires the development of transgenic rice seedlings for detecting TFBSs rather than an analysis of epigenetics. A GFP coding sequence is usually fused in frame to the 3′ end of the TF gene sequence. However, the Flag-tag or other tags may alternatively be used to label the TFs [92].
Regarding rice, the cross-linking of protein and DNA is difficult in old tissue. Therefore, ChIP experiments are usually completed with 2–4-week-old transgenic rice seedlings. Formaldehyde is an excellent cross-linking agent that is able to form stable protein–protein, protein–DNA and protein–RNA complexes in cells. Moreover, the cross-linked complexes generated by formaldehyde are completely reversible, which facilitates the subsequent analysis of the protein and DNA [96, 97]. The duration of the cross-linking step is one of the crucial factors determining whether the ChIP experiment will be successfully completed. If this step is too short, there will be an insufficient amount of the target segment, ultimately leading to a false-negative result. If this step is allowed to continue for too long, it will not only cause the excessive background noise but also interfere with the sonication treatment [86]. As a general guideline, the cross-linking should be stopped when the material becomes transparent [98]. Additionally, the cross-linked material will need to be thoroughly washed with ddH2O to remove any residual formaldehyde, which may adversely influence the subsequent experimental steps. Furthermore, liquid nitrogen may be used to grind cross-linked rice tissue into a fine powder, from which pure chromatin will be extracted [98]. After resuspending the chromatin in nuclei lysis buffer, the solution is sonicated to shear the chromatin into certain sizes, which will depend on the study objective. The size of the resulting fragments is measured by 1% agarose gel electrophoresis. The protein of interest is immunoprecipitated from the fragmented chromatin solution with an antibody specific for the tag fused to the target protein [92, 95]. Note that the quality of the antibody is a critical factor for the recovery of DNA fragments. Diverse antibodies have their own affinities for epitopes, which can affect the ChIP results [86, 87, 99]. The DNA is recovered from the precipitate of the protein–DNA fragment complex by reverse the cross-links, after which the DNA is purified prior to any analyses. The process involved in a ChIP experiment is presented in Figure 4.

Conclusion
There is currently considerable interest in TFs, which are important for rice functional genomics research. Analyses of the interactions between TFs and DNA sequences can be divided into the following three steps (Figure 5):
(i) Characterize the TFs by confirming their effects on rice (e.g. phenotype and physiological processes).
(ii) Search for target genes. Apply the publicly available information, experimental data and/or bioinformatics-based data to identify potential downstream target genes that may be directly regulated by TFs. For example, predict the TFBSs using bioinformatics tools, like those described herein, or genome-wide gene expression profiles to detect TF-related genes.
(iii) Confirm interactions between TFs and target genes. Experimentally verify that the TFs bind directly to the target genes.
The effective use of bioinformatics tools may increase the efficiency and accuracy of TFBS predictions. However, the above-mentioned methods all have their own drawbacks. In Y1H, it is well known that auto-activity is a contributing factor for false-positive results. Yeast endogenous TFs can bind to the DNA of interest, leading to an activated reporter. This phenomenon, in which target elements activate reporter genes without TFs, is called auto-activity. Additionally, the ‘prey’ proteins are toxic to yeast cells or the proteins cannot be stably expressed in host cells, both of which will cause false-negative results. Regarding the EMSA, the cost of the biotin label is high, and biotin may affect protein–DNA interactions. Although 32P does not influence the interactions, the associated radioactivity is hazardous. The disadvantages of the ChIP assay include the fact it requires highly specific and quality antibodies, and the cross-linking by formaldehyde may cause non-specific binding, resulting in a false-positive signal [86, 100]. Moreover, the transgenic rice seedlings required for the ChIP assay may be relatively difficult to produce [101]. Furthermore, the chromatin is randomly sheared into fragments, which may make it difficult to repeat. Thus, two or more techniques may be needed to confirm the proteins of interest directly bind to target DNA sequences [78, 79, 93, 94, 102, 103]. Several yield-related TFs have been molecularly identified and characterized by these experiments (Table 2). These results and related information is collected in rice-related database [23, 40, 41, 104].

Genes . | Regulated traits . | Method . | References . |
---|---|---|---|
OsNAC2 | Plant height, flowering, | ChIP-PCR, Dual-LUC, Y1H | [79] |
OsSPL16 /GW8 | Grain width | ChIP-PCR, EMSA, LUC, Y1H | [109] |
OsTB1 | Tillering | ChIP-PCR, EMSA, LUC, Y1H | [110] |
OsMADS1/ LGY3 | Grain number and size | ChIP-PCR, EMSA | [111] |
OsMADS57 | Tillering | ChIP-PCR, EMSA, LUC, Y1H | [110] |
OsLIC1 | Tiller angle | ChIP-PCR, EMSA | [112] |
IPA1 | Tillering ,panicle development | ChIP-PCR, EMSA | [113] |
DST | Grain number | ChIP-PCR, EMSA | [114] |
SERF1 | Grain weight | ChIP-PCR, EMSA, LUC | [115] |
Genes . | Regulated traits . | Method . | References . |
---|---|---|---|
OsNAC2 | Plant height, flowering, | ChIP-PCR, Dual-LUC, Y1H | [79] |
OsSPL16 /GW8 | Grain width | ChIP-PCR, EMSA, LUC, Y1H | [109] |
OsTB1 | Tillering | ChIP-PCR, EMSA, LUC, Y1H | [110] |
OsMADS1/ LGY3 | Grain number and size | ChIP-PCR, EMSA | [111] |
OsMADS57 | Tillering | ChIP-PCR, EMSA, LUC, Y1H | [110] |
OsLIC1 | Tiller angle | ChIP-PCR, EMSA | [112] |
IPA1 | Tillering ,panicle development | ChIP-PCR, EMSA | [113] |
DST | Grain number | ChIP-PCR, EMSA | [114] |
SERF1 | Grain weight | ChIP-PCR, EMSA, LUC | [115] |
Genes . | Regulated traits . | Method . | References . |
---|---|---|---|
OsNAC2 | Plant height, flowering, | ChIP-PCR, Dual-LUC, Y1H | [79] |
OsSPL16 /GW8 | Grain width | ChIP-PCR, EMSA, LUC, Y1H | [109] |
OsTB1 | Tillering | ChIP-PCR, EMSA, LUC, Y1H | [110] |
OsMADS1/ LGY3 | Grain number and size | ChIP-PCR, EMSA | [111] |
OsMADS57 | Tillering | ChIP-PCR, EMSA, LUC, Y1H | [110] |
OsLIC1 | Tiller angle | ChIP-PCR, EMSA | [112] |
IPA1 | Tillering ,panicle development | ChIP-PCR, EMSA | [113] |
DST | Grain number | ChIP-PCR, EMSA | [114] |
SERF1 | Grain weight | ChIP-PCR, EMSA, LUC | [115] |
Genes . | Regulated traits . | Method . | References . |
---|---|---|---|
OsNAC2 | Plant height, flowering, | ChIP-PCR, Dual-LUC, Y1H | [79] |
OsSPL16 /GW8 | Grain width | ChIP-PCR, EMSA, LUC, Y1H | [109] |
OsTB1 | Tillering | ChIP-PCR, EMSA, LUC, Y1H | [110] |
OsMADS1/ LGY3 | Grain number and size | ChIP-PCR, EMSA | [111] |
OsMADS57 | Tillering | ChIP-PCR, EMSA, LUC, Y1H | [110] |
OsLIC1 | Tiller angle | ChIP-PCR, EMSA | [112] |
IPA1 | Tillering ,panicle development | ChIP-PCR, EMSA | [113] |
DST | Grain number | ChIP-PCR, EMSA | [114] |
SERF1 | Grain weight | ChIP-PCR, EMSA, LUC | [115] |
In previous studies on DNA methylation in rice, researchers tended to ignore the interactions between DNA and proteins without methyl-CpG-binding domain. LUC, EMSA and ChIP were applied in identifying methylated DNA–TF interactions in mammal [105]. These methods applied in mammals provide excellent instances for studying methylated DNA–TF interactions in rice. Methylated DNA–TF interactions may play important role in rice growth and development, which should not be ignored.
The available experimental techniques for characterizing TFs can generate new information and data that can be used to improve algorithms, ultimately leading to more accurate predictions for future studies. More accurate predictors for DNA-binding protein identification considering the features of proteins will promote the development of this important field [106]. Particularly, feature representation learning recently has proven to be capable of capturing the high-latent sequential characteristics of proteins and DNAs [107, 108]. These methods are useful for elucidating the regulatory mechanisms of TFs, which may be relevant for the application of molecular techniques to develop new rice varieties with ideal traits.
On the basis of several typical examples of transcription factor analyses, we summarize selected research strategies and methods and introduce their advantages and disadvantages.
We divided research process of the interactions between TFs–DNA sequences into the following three steps.
Agriculture researchers should pay more attention to methylated DNA–TF interactions in rice.
Bioinformatics algorithms and biological research can promote each other, then paved the way for developing new rice varieties with ideal traits.
Acknowledgements
We thank Liwen Bianji, Edanz Editing China (www.liwenbianji.cn/ac) for editing the English text of a draft of this manuscript.
Funding
National Key R&D Program of China (2018YFC0910405) and the Natural Science Foundation of China (No. 61771331).
Zijie Shen is a research assistant in the Institute of Fundamental and Frontier Sciences at the University of Electronic Science and Technology of China, Chengdu
Yuan Lin is a system architect in Sparebanken Vest, Norway. She received her PhD degree from Norwegian University of Science and Technology in 2009.
Quan Zou is a professor in the Institute of Fundamental and Frontier Sciences at the University of Electronic Science and Technology of China, Chengdu. He received his PhD from Harbin Institute of Technology in 2009.