Motivation: Inferring the relationships between transcription factors (TFs) and their targets has utmost importance for understanding the complex regulatory mechanisms in cellular systems. However, the transcription factor activities (TFAs) cannot be measured directly by standard microarray experiment owing to various post-translational modifications. In particular, cooperative mechanism and combinatorial control are common in gene regulation, e.g. TFs usually recruit other proteins cooperatively to facilitate transcriptional reaction processes.
Results: In this article, we propose a novel method for inferring transcriptional regulatory networks (TRN) from gene expression data based on protein transcription complexes and mass action law. With gene expression data and TFAs estimated from transcription complex information, the inference of TRN is formulated as a linear programming (LP) problem which has a globally optimal solution in terms of L1 norm error. The proposed method not only can easily incorporate ChIP-Chip data as prior knowledge, but also can integrate multiple gene expression datasets from different experiments simultaneously. A unique feature of our method is to take into account protein cooperation in transcription process. We tested our method by using both synthetic data and several experimental datasets in yeast. The extensive results illustrate the effectiveness of the proposed method for predicting transcription regulatory relationships between TFs with co-regulators and target genes.
Availability: The software TRNinfer is available from http://intelligent.eic.osaka-sandai.ac.jp/chenen/TRNinfer.htm
Supplementry information: Supplementary data are available at Bioinformatics online.
With the rapid advance of biological science, tremendous amounts of biological data have been produced by high-throughput technologies. In particular, microarray gene expression data have become an increasingly common information source that can quantitatively provide insights for understanding complex mechanisms in a cell at a system-wide level (Hughes et al., 2000). In addition to clustering microarray data for functional analysis, a hot topic on gene expression data analysis is the reconstruction of gene regulatory network which aims to reveal the underlying network of gene–gene interactions from the measured dataset of gene expression (Hartemink, 2005; Basso et al., 2005). In the last few years, a number of methods have been developed for inferring gene regulatory networks from time-course data such as Bayesian networks (Beal et al., 2005; Hughes et al., 2000; Husmeier, 2003), differential equations (Chen and Aihara, 2002; de Jong, 2002) and optimization techniques (Wang et al., 2006).
In contrast to reconstruction of gene–gene interactions, recently inferring direct relationships between transcription factors (TFs) and target genes attracts much attention. The transcription regulation of genes is achieved by DNA-binding proteins that attach to specific DNA promoter regions. These DNA-binding proteins are known as transcription regulators or TFs which recruit other proteins to form chromatic-modifying complexes and transcription apparatus to initiate RNA synthesis (Lee et al., 2002). In recent years, many experimental and computational techniques have been used to identify TFs and their target genes in several organisms such as Saccharomyces cerevisiae, Escherichia coli and Drosophila (Harbison et al., 2004; Iyer et al., 2001; van Steensel et al., 2003), of which one important technique is ChIP-on-chip, also known as genome-wide location analysis. This technique combines a modified chromatin immunoprecipitation (ChIP) assay with microarray technology and can identify the DNA sequences (target genes) occupied by specific DNA-binding proteins (TFs) in cells. However, ChIP-on-chip technique is not so reliable and suffers from a large proportion of false positives (Boulesteix and Strimmer, 2005).
Recently, several research groups integratively analyzed gene expression data and ChIP connectivity to infer transcription factor activities (TFAs) since TFAs cannot be measured directly by standard microarray experiment. For instance, Liao et al. (2003) have developed a method called network component analysis (NCA) which incorporates a prior qualitative knowledge about gene–TF interactions to infer the true regulatory activities. This method was extended as partial least square (PLS)-based NCA by Boulesteix and Strimmer (2005) which offers an efficient and sound way to infer true TFAs for any given connectivity matrix without much restriction like NCA. In addition to the methods for predicting the activities of TFs, several groups also attempted to recover the network structure between TFs and their targets using the gene expression levels of both TFs and target genes (Segal et al., 2003; Xiong et al., 2004). However, most existing algorithms for inferring transcription regulatory networks (TRN) from gene expression data assume that one TF without other cooperative proteins participates in the regulation process of a gene and directly use its mRNA level as activity profile. As is well known to us, this assumption is not biologically reasonable, since cooperative mechanism and combinatorial control are common in gene regulation and a TF usually recruits other proteins cooperatively to facilitate transcriptional function (Banerjee and Zhang, 2003; Eisbacher et al., 2003; Charron et al., 1999).
Although ChIP-chip technique can detect target genes occupied by specific DNA-binding proteins (TFs), generally a single TF cannot mediate transcription reactions. In the transcription process, TFs and several proteins cooperatively regulate the expression of a gene by combining into a protein transcription complex (TC) (Remenyi et al., 2004). Those proteins in a TC generally cannot be detected by ChIP-chip technique because they are non-DNA-binding proteins, despite their similar roles in regulating a target gene as TFs. A TC is a protein complex which is formed through a series of elementary biochemical reactions. This reaction mechanism obeys the law of mass action which means that the rate of any given elementary reaction is proportional to the product of the concentrations of the reactants. This law can be used to analyze the behavior of biochemical systems through dynamical equations and has been applied in reconstruction of biochemical reaction pathways from time-course data (Crampina et al., 2004; Srividhya et al., 2007). In this work, mass action law in transcription process and the expression levels of genes in TCs are used to approximate TFAs. This idea is motivated by the fact that a transcription process is often achieved by a TF with other cooperative proteins in a TC which is formed by a series of biochemical reactions. According to these biochemical reactions, the activity level of a TF can be viewed as a non-linear function with respect to the gene expression levels of the TF and its cooperative proteins in the TC instead of the mRNA level of a single constituent gene, which is consistent with biological experiments (Banerjee and Zhang, 2003; Eisbacher et al., 2003; Charron et al., 1999).
In this article, we propose a novel method for inferring TRN based on TFAs. We first approximate TFAs by respective TCs in transcription process based on mass action law. Then, with gene expression data and TFAs, inferring the direct interactions between TFs and target genes is formulated as a linear programming problem which is computationally fast and has a globally optimal solution in terms of L1 error. The proposed method (TRNinfer) not only exploits the activities of TC which contain protein–protein interaction information, but also can easily incorporate ChIP-Chip data as prior knowledge. In addition, the proposed approach can integrate multiple gene expression datasets from different experiments which can significantly alleviate the scarcity of data. A unique feature of TRNinfer is to take into account protein cooperation in transcription process. We tested our method by using both simulated data and several experimental data in yeast. Extensive results demonstrate the effectiveness and efficiency of the proposed method for predicting transcription regulatory relationships between TFs with co-regulators and target genes.
2.1 Transcriptional regulatory network
Transcription regulation has been responsible for organismal complexity and diversity in the course of biological evolution and adaptation. It can be represented by a set of differential equations with gene expression levels and TFAs as variables, i.e. the dynamical model of TRNwith time points t = t1, … ,tn denotes the difference of gene expression levels. a(t) = (a1(t), … ,ac(t)) denotes the activity level of TFs (TFAs) which are generally functions of x. In above equation, the first term denotes the synthesis rates of mRNAs and the second one denotes the degradation rates where K = diag(k1, …, km).
Generally, transcriptional regulations are non-linear, i.e. f (·) = ( f1(·), … , fm(·))T is a non-linear function vector. However, due to the complex and unclear structures of biological systems, linear or additive models are often adopted. If the first-order approximation of f is adopted, the linear form of (1) is2), we have the following matrix form which is a linear TRN: and B = (b(t1), … ,b(tn)) are m × n matrices, and A = (a(t1), … ,a(tn)) is a c × n matrix.
Biological systems are inherently non-linear and show multistability and non-linear oscillation which may not be so appropriately expressed by a linear model. Assume that a TRN can be expressed by a set of non-linear differential equations with each gene expression levels and TFAs as variables, i.e. the non-linear TRNfor i = 1, … ,m. f(u(t)) = (f(u1(t)), … , f(um(t)))T is non-linear functions. f(ui) specifies the production efficiency of gene i, as a function of the weighted accumulated effects of all aj modified by the gene's activation threshold or external stimulus bi(t). f(ui) is generally expressed as a sigmoidal transfer function: 2003).
In contrast to the dynamical model of (1), when the system in (1) is at an equilibrium, or x(t) = K− 1f(a(t)). Liao et al. (2003) suggested f(·) can be approximated by a log-linear model, i.e. the expression level of a gene is such a function of the connection strength of each regulatory pair and regulators' activities:12)).
We will see in the subsequent sections that for linear model (3), non-linear model (4) or log-linear model (6), whichever TRN model is adopted, the inference process of TRN can all be formulated into a linear programming (LP) framework.
2.2 Approximating TFA
The TFA levels cannot be measured directly by standard microarray experiment due to various post-translational modifications and biochemical reactions (Boulesteix and Strimmer, 2005). Most existing algorithms for inferring TRN directly use the mRNA level of a TF as its activity (Segal et al., 2003; Xiong et al., 2004). This assumption is not biologically reasonable since transcription process is usually achieved by one or more TFs with several cooperative proteins. These TFs and proteins form a TC through a series of biochemical reactions (Banerjee and Zhang, 2003; Eisbacher et al., 2003). Therefore, the TF activity level cannot be simply determined by its mRNA but depends on the expression levels of all genes in the TC. In this work, from the biochemical reactions which obey law of mass action, we estimate the TFA levels from the protein composition of TCs and the expression levels of individual genes. Mass action law means that the rate of any given elementary reaction is proportional to the product of the concentrations of the reactants (Crampina et al., 2004). Let us consider the following general chemical reaction:2004). Applying the law of mass action, the governing equations of the above reactions are given by 7) and Figure 1, i.e. the TC is composed of 7), the activity level a of A (represented by the activity of TF A0) can be given approximately by
It is worth noting that we also use the mRNA level of a constituent gene as its protein concentration in estimating the activity level of a TC. This assumption may not be always true since some post-translational modification events such as phosphorylation, degradation can affect transcriptional activities (Tootle and Rebay, 2005). However, as mentioned by Tootle and Rebay (2005), studies on post-translational modifications of TFs are only in the initial stages, and it is a still difficult task to quantify the effects of these biological events. Although using the mRNA levels of all constituent genes in a TC is one step forward, more elaborated formulations are needed in the future.
2.3 Inferring TRN
Assume that there are multiple datasets for one organism which may be measured under various environments by different laboratories. Let Xk denote the kth gene expression data (m × nk) and Ak denote the kth activity data of TF complexes (t × c). Then, according to the differential equation (3) by ignoring the last term, we have10) is an optimization problem with positive combination of L1 norm of variables Jij which can be transformed into a LP problem through a well-known procedure (see Supplementry materials). Owing to L1 norm, generally the optimal solution of (10) has as many zeros for and |J| as possible, which exactly serves our purpose, i.e. consistent and sparse structure. Also the proposed framework can handle multiple datasets even if they are obtained from different experiments or conditions. Although the solution may depend on λ, it is a single parameter which can be tuned in a relatively easy manner or be simply tested for a range of its value.
For the sigmoidal non-linear model of the TRN (4), we have6), a similar LP problem in the same framework can be obtained: by using linear difference scheme (Yeung et al., 2002; Wang et al., 2006). Above models can employ multiple microarray datasets from different conditions or experiments simultaneously which will alleviate significantly the scarcity of time points in a single dataset. In order to make the inference more accurate when only a single dataset is available, cubic spline interpolation methods may be explored to calculate the derivatives of x(t) (de Boor, 1978).
3 EXPERIMENT RESULTS
In this section, we conducted several numerical experiments to evaluate the method by using multiple synthetic datasets and yeast microarray gene expression data. The algorithm is implemented in the Fortran programming language on a 1.66 GHz Pentium 4 PC and the software (TRNinfer) is available from http://intelligent.eic.osaka-sandai.ac.jp/chenen/TRNinfer.htm
3.1 Synthetic data
3.1.1 Synthetic time-course data
The first example is a synthetic TRN with three genes and three TFs (Fig. 2a). In this network, TF1 with the cooperation of protein P1 and protein P2 forms a TC1 to regulate tf2 (activation) and gene p3 (repression), while TF3 and protein 3 form a TC3 to regulate gene p2 and tf1. TF2 as a special TC2 regulates tf3 (repression) and gene p1 (activation). These relationships are governed by the following non-linear differential equations:
We randomly chose the initial condition of the system and took several points of x as a measured time-course dataset. With five different initial conditions, we obtained five different datasets with five time points, respectively. According to the method for computing TFAs (9), xtf1(t)x1(t)x2(t), xtf2(t) and xtf3(t)x3(t), respectively represent the activities of TF1, TF2 and TF3 (or TC1, TC2 and TC3 in Fig. 2a). With the time-course data and the activity levels of TFs, we applied LP model (10) to reconstruct the connectivity matrix of TFs and target genes (i.e. the Jacobian matrix J). The true regulatory structure and coefficients can be recovered accurately by TRNinfer. Specifically, when a single dataset is used and the sparse parameter λ = 0, we obtained a connection matrix shown in the first subtable of Table 1, where the inferred connection matrix is dense and the corresponding entries are not accurate compared with the original coefficients in (13) or Figure 2b. By increasing the sparse parameter to λ = 0.05, the inferred connection matrix is shown in the second subtable of Table 1. We can see that it is sparser than the one obtained at λ = 0 and the corresponding entries are also more accurate. Such results justify the benefit brought by the sparse parameter. Moreover, when multiple datasets are used, we obtained a more accurate connection matrix which illustrates the advantage of employing more datasets (see the third and fourth subtables of Table 1).
|L = 1, λ = 0.0||p1||p2||p3||tf1||tf2||tf3|
|L = 1, λ = 0.05|
|L = 5, λ = 0.0|
|L = 5, λ = 0.5|
|L = 1, λ = 0.0||p1||p2||p3||tf1||tf2||tf3|
|L = 1, λ = 0.05|
|L = 5, λ = 0.0|
|L = 5, λ = 0.5|
The results for a simulated example with noises ξi(t) (see Table 4 in Supplementry materials) also illustrate the effectiveness of the LP method on suppressing noises and the advantages of the sparse parameter and multiple datasets.
3.1.2 The hemoglobin data
Now we use a network of seven hemoglobin solutions (denoted by M1, … , M7) and their absorbance spectra which were measured in Liao et al. (2003) to evaluate our method. Each solution contains a combination of three components: oxyhemoglobin, methemoglobin and cyano-methemoglobin. According to Beer–Lambert law, the absorbance spectra of the mixture can be described as a linear combination of the composition proportions of three components and the absorbance spectra of each pure solution (Liao et al., 2003). The mixing diagram representing the compositions of pure components is shown in Figure 3. The mixture composition is known and we test if or not TRNinfer can correctly identify the compositions of each mixture and the corresponding concentration. Here LP model (12) is adopted since the dataset is not time-course and can be viewed as steady-state data. The result of our method on this dataset is summarized in Table 2. Clearly, TRNinfer can identify the components in each of mixed hemoglobin solution. The inferred compositions of each component are near the true but with a certain error. This is because the mixed absorbance spectra in experiment data are not exactly equal to the linear combination of the composition proportions of three components and the absorbance spectra of each pure solution. Provided that this linear combination exactly holds, our method can not only identify the components but also exactly infer the corresponding composition proportions (see Table 5 in Supplementry materials).
3.2 Experimental data
In this section, we apply TRNinfer to experimental data. Since the transcription activity of a TF is approximated by the corresponding TC based on the expression levels of individual genes through the law of mass action, we need to collect data of TCs. In the budding yeast S.cerevisiae, ChIP-chip experiments have been utilized to elucidate the binding interactions between 6270 genes and 113 preselected TFs (Lee et al., 2002). By checking yeast protein complexes in MIPS (Mewes et al., 2002), we found that 26 TFs are in protein TCs. The number is not so significant mainly because the current deposition of protein complexes is highly incomplete. With the increasing deposition of information, more protein complexes will be available. Among these 26 TFs, some are related to yeast cell cycle (Spellman et al., 1998) and some are related to polyphosphate metabolism in S.cerevisiae (Ogawa et al., 2000). We report the results of TRNinfer on these two datasets.
3.2.1 Yeast cell-cycle data
We first tested our method using gene expression data for cell-cycle studies in S.cerevisiae (Spellman et al., 1998). According to the ChIP experiments (Lee et al., 2002), there are 11 TFs that are known to be related to cell-cycle regulation, among which five TFs are in four different TCs. The details of these TFs and their TCs are given in Table 6 in Supplementary materials. Except these five TFs, we selected eight genes that are closely related to cell cycle based on their function information. According to the gene expression data from Spellman et al., 1998, we generated four datasets with the number of time points 18, 17, 24 and 14, respectively.
TRNinfer solved the above time-course datasets within five seconds by using LP model (10) and we obtained a transcription regulatory connection matrix which characterizes the interaction between target genes and TFs (or TCs) (see Table 7 of Supplementary materials). The corresponding TRN is shown in Figure 4, where the bold edges indicate that the regulatory relations are confirmed by documented information and the thin edges are potential regulations. The unconfirmed inferred regulations are denoted by dotted edges. From Figure 4 we can see that most regulatory relations inferred by our method can be confirmed (Teixeira et al., 2006). In this transcriptional network, DNA-binding proteins SWI4 and SWI6 are transcription cofactors, forming a complex to regulate transcription at the G1/S transition. They are involved in meiotic gene expression, localization regulated by phosphorylation and potential Cdc28p substrate (Teixeira et al., 2006). SWI6 and MBP1 are believed to involve in a same regulatory module by many literature (Bar-Joseph et al., 2003, Wu et al., 2006). MBP1 forms a complex with SWI6 that binds to MluI cell-cycle box regulatory element in promoters of DNA synthesis genes. The inferred yeast cell-cycle transcription network demonstrates that the proposed method can provide not only direct relations between TFs and target genes but also the interactions between co-regulating proteins and target genes.
We introduced a P-value formula used in Husmeier (2003) to evaluate the significance of the predicted interactions between TFs and target genes:
In order to justify the benefit of considering TCs, we conducted comparative experiments for three methods: LP method based on TCs, LP method based on only mRNA levels of TFs and singular value decomposition (SVD) method based on mRNA levels of TFs (Yeung et al., 2002). Sensitivity and specificity are used to evaluate the inference results. Owing to the scarcity of gold standard information (true network), we treated the regulations with supporting evidences in YEASTRACT (Teixeira et al., 2006) as true edges and all other regulations as true negatives. The results are plotted in Figure 5a, where the curves are obtained by setting different thresholds on the inferred connection matrix. From Figure 5a, we can see that considering TCs can make the inference more accurate. The worse performances of LP_mRNA and SVD_mRNA come from two aspects: one is that there may exist bias in the concrete sensitivity and specificity values since all unsupported regulations are treated as false positives which may not be true. Further biological experiments are needed to confirm them. Another is that, in this example, four TFs are all in TCs, which means that these TFs recruit other cooperative proteins in transcription processes, so only using the mRNA levels of TFs as their activity profiles will lead to big deviation. We also conducted similar experiments like Husmeier (2003), i.e. added extra artificial TF complexes to test the robustness of our method. For this dataset, two artificial TF complexes (i.e. increased 2 × 13 = 26 possible edges) whose activity profiles were randomly generated. The computational result indicates that the original confirmed edges are almost not affected and only a very small portion of the added possible edges are inferred as false positives, which demonstrates the robustness of our method (see Figure 8a in Supplementary materials).
To test the possibility of approximating TFAs by the expression levels of individual genes in TCs, we checked the periodicity of the activity levels of the TFs because it is believed that the activities of TFs related to cell cycle tend to be periodic. The activity profiles of MBP1 (TC1: 510.190.70) and SWI4/SWI6 (TC4: 510.190.60) are plotted in Figure 7 of Supplementary materials. The activity profiles of these regulation factors show highly periodic patterns which are consistent with common biological knowledge. In contrast, the individual gene expression profiles of MBP1, SWI4 and SWI6 are not periodic. This fact can be confirmed by Fisher's g-test in Wichert et al. (2004). Table 3 lists the P-values of the periodicity for the activity profiles and the gene expression profiles of these genes which indicate that their activity profiles are more periodic than the corresponding gene expression profiles. Although the periodicity of the activity profiles for MCM1 (TC2:510.190.120) and STB1 (TC3:510.190.150) is not so significant, it is still more significant than their gene expression profiles in terms of P-value. These results indicate that the activity of a TF cannot be well represented by its mRNA level (Boulesteix and Strimmer, 2005; Liao et al., 2003) but depends on the expression levels of all genes in the transcription complex. Therefore, those methods which only use the mRNA level of one TF or constituent gene are not biologically reasonable and likely to misexplain important transcriptional regulatory relations.
3.2.2 Polyphosphate metabolism data
Now, we test TRNinfer using gene expression data for polyphosphate metabolism in S.cerevisiae (Ogawa et al., 2000). Among the TFs related to polyphosphate metabolism verified by the ChIP experiments (Lee et al., 2002), there are 14 TFs appearing in nine different TCs. The details of these TFs and their TCs are given in Table 8 of Supplementary materials. This gene expression data have totally eight conditions. Among the genes in this dataset, those with change of two-fold up or down in at least two-time points of the expression levels are believed to be closely related to polyphosphate metabolism. In such a way, totally 64 genes (including 14 TFs) form a test data.
TRNinfer solved this non-time-course dataset within one second by using LP model (12) and we obtained a TRN with 106 links. Since the network is large, only a part with three TFs is shown in Figure 6, where the types of edges have similar meanings to those in cell cycle TRN. The genes with same functions are denoted by the same color and the functions of the genes with gray color are unknown yet. By the formula (14), the inferred polyphosphate metabolism transcriptional network is significant (P < 0.002), whereas the average P-value of the inferred network on multiple permuted datasets is 0.33, and the P-value of the inferred network is 0.32 by using the mRNA level of a TF as activity profile. In this transcriptional network, the retrograde (RTG) RTG1 and RTG3 are bHLH/Zip proteins and transcription cofactors. RTG3 forms a complex with RTG1 to activate the RTG and target of rapamycin (TOR) pathways (Teixeira et al., 2006). Again the proposed method can also provide the interactions between co-regulating proteins and target genes in addition to the direct relations between TFs and target genes.
The comparative results of three methods mentioned in above subsection on this dataset are plotted in Figure 5b, which again demonstrates the higher inference accuracy of LP method with considering TC activities. The low sensitivity and specificity values of three methods are because they all inferred a smaller size network with fewer edges than the ‘true’ network. Similarly, three artificial TF complexes (i.e. increased 3 × 64 = 192 possible edges) were added to test the robustness of our method. The computational result (see Figure 8b in Supplementary materials) again demonstrates that our method is robust to false positives.
4 CONCLUSION AND DISCUSSION
Revealing the direct relationships between the target genes and TFs with co-regulators is essential for understanding the complex regulatory mechanisms in cellular systems. A TF usually cannot facilitate the transcription without the cooperation of other proteins, so the activity of a TF is not simply determined by its mRNA but depends on the expression levels of all genes in the TC. In this paper, we proposed a method for inferring TRN by TCs based on mass action law, with major features as follows:
The proposed method takes into account protein cooperation in transcription process and infers TRNs without the requirement of experimentally measuring the activities of TFs.
The inference process is formulated as a LP problem which ensures global solution can be efficiently found and makes our method scale up well to large-scale networks (e.g. with over hundreds or even thousands of genes).
We can identify the direct regulations not only between TFs and target genes but also between co-regulating proteins and target genes.
The model can deal with multiple gene expression datasets from different experiments and be easily extended to accommodate the non-linear formalism of transcription process.
We tested our method by using both simulated data and several experimental data in yeast. Extensive results illustrated the effectiveness of the proposed method on predicting transcription regulatory relationships. Currently, the number of known TCs is small, especially for organisms except yeast. With the rapid development of biotechnologies, more datasets will be available for us which can enhance the significance of the proposed method.
A limitation in our approach is that we ignore post-translational modifications in transcription processes. As is well known to us, post-translational modifications such as phosphorylation, protein degradation may affect the activity of a TF. Current studies on the effects of post-translational modifications are only in the initial stages, and it is still a difficult task to quantify the effects of these biological events (Tootle and Rebay, 2005). Although using the mRNA levels of all constituent genes in a TC is one-step forward, more elaborated formulations are needed in the future. Furthermore, the mechanisms of competitive binding and combinatorial control between multiple TFs will also affect TF activity levels. Integrating detailed cooperation processes of TFs in gene regulation will make the reconstruction of TRN more realistic.
In the inference process of transcriptional network, in order to avoid obvious false positives and improve the inference accuracy, generally we require that there is at least one candidate TC to be included in dataset for each gene. If such information is unavailable, we can filter out those genes according to the weak correlations between and all TFAs among A or biological function annotations as a preprocessing procedure.
The authors are grateful to the anonymous referees for their valuable comments and suggestions. This research work is supported by JSPS under JSPS-NSFC collaboration project, the National Nature Science Foundation of China (NSFC) under grant No. 10701080 and the Ministry of Science and Technology, China, under grant No. 2006CB503905.
Conflict of Interest: none declared.