Abstract
Motivation: The sequence specificity of DNAbinding proteins is typically represented as a position weight matrix in which each base position contributes independently to relative affinity. Assessment of the accuracy and broad applicability of this representation has been limited by the lack of extensive DNAbinding data. However, new microarray techniques, in which preferences for all possible Kmers are measured, enable a broad comparison of both motif representation and methods for motif discovery. Here, we consider the problem of accounting for all of the binding data in such experiments, rather than the highest affinity binding data. We introduce the RankMotif++, an algorithm designed for finding motifs whenever sequences are associated with a semiquantitative measure of proteinDNAbinding affinity. RankMotif++ learns motif models by maximizing the likelihood of a set of binding preferences under a probabilistic model of how sequence binding affinity translates into binding preference observations. Because RankMotif++ makes few assumptions about the relationship between binding affinity and the semiquantitative readout, it is applicable to a wide variety of experimental assays of DNAbinding preference.
Results: By several criteria, RankMotif++ predicts binding affinity better than two widely used motif finding algorithms (MDScan, MatrixREDUCE) or more recently developed algorithms (PREGO, Seed and Wobble), and its performance is comparable to a motif model that separately assigns affinities to 8mers. Our results validate the PWM model and provide an approximation of the precision and recall that can be expected in a genomic scan.
Availability: RankMotif++ is available upon request.
Contact:quaid.morris@utoronto.ca
Supplementary information: Supplementary data are available at Bioinformatics online.
1 INTRODUCTION
Understanding protein–DNA interactions is critical to understanding genome function. Recognition of specific DNA sequences by proteins is central in processes such as control of gene expression and chromosome replication, and as a consequence plays an extensive role in shaping genome sequence. On the basis of domain structure, over 6% of all human genes encode potential DNAbinding proteins (Messina et al., 2004), and the majority of sequence conserved among verterbrates is noncoding and may be regulatory (Pennacchio et al., 2006; Siepel et al., 2005).
Given the importance of protein–DNA interactions in computational biology, it is remarkable that the simple position weight matrix (PWM) model remains a virtually unchanged standard more than 20 years after its introduction (Berg and von Hippel, 1987). The PWM is a simple model that predicts the binding energy of a protein–DNA complex. It assumes that each DNA base in the binding site of the protein makes an independent, additive contribution to the binding energy of the protein–DNA complex. Other representations of DNAbinding preferences, such as position frequency matrices (PFMs), that assume that bases contribute independently to the binding affinity can be transformed into PWMs.
For most transcription factors (TFs), the PWM is derived from a few dozen sequences that the TF is known to bind. These sequences are most commonly obtained from in vitro selection (Vlieghe, 2006), a procedure that selects for and reports only the most highly specific binding sequences (Roulet et al., 2002). PWMs are increasingly inferred from ChIPchip data as well (MacIssac et al., 2006), despite binding sites in vivo being influenced by factors other than binding specificity [e.g. protein–protein interactions and chromatin structure (Liu et al., 2005)]. In fact, one recent report suggests widespread use of predicted weak DNAbinding sites in vivo (Tanay, 2006), suggesting that stringent in vitro selections may not capture all of the information relevant to in vivo function of TF DNAbinding activities.
To our knowledge, the additive PWM model has not been rigorously challenged in its ability to correctly predict affinity to all possible binding sequences, largely due to the difficulty of making measurements on an exhaustive collection of sequences in vitro and the presence of confounding factors in vivo. Until recently, only a relatively small number of sequences had ever been tested for any given TF (Man and Stormo, 2001; Mukherjee et al., 2004). However, development in the last two years of microarraybased techniques for measuring protein–DNA associations (Berger et al., 2006; Liu et al., 2005; Mukherjee et al., 2004; Warren et al., 2006) promises a dramatic expansion in protein–DNA interaction data.
These data present new opportunities to revisit both the PWM model of protein–DNA affinity as well as the various algorithms designed to learn this model from binding data. The validity of the PWM model can be tested by measuring its ability to reproduce the protein–DNA interaction data. However, fitting a PWM model, or motif, to these microarraybased measurements of binding affinity is difficult. In these data, a microarray intensity level is a semiquantitative measurement of the affinity of the TF for the DNA sequence represented by the probe. Traditional motif finding algorithms like Bioprospector (Liu et al., 2001) and MEME (Bailey and Elkan, 1994) mostly ignore this intensity and treat all probe sequences above a given threshold equally. As a result the motifs produced by these algorithms are very sensitive to the ad hoc choice of the threshold.
Two newer popular motif finding algorithms, MDScan (Liu et al., 2002) and MatrixREDUCE (Foat et al., 2005, 2006), incorporate intensity data into their search but make different assumptions about the relationship between binding affinity and the microarray intensity. MDScan uses the intensity data to classify bound probes into two groups based on the rank of their intensity. The higher ranked group contributes more to the PWM. MatrixREDUCE uses the microarray intensity as a surrogate for binding affinity and fits a motif model by assuming a linear relationship between the two. Though microarray intensity can be to used to predict the relative binding preferences of a TF for given pairs of probes, the exact functional form of the relationship between the TFbinding affinity for a probe sequence and its intensity level remains unclear and may vary depending on the experimental technique.
To address this problem, algorithms have recently been developed that fit motif models to intensity rankings rather than the intensities themselves (Berger et al., 2006; Chua et al., 2006; Tanay, 2006). One of these, PREGO (Tanay, 2006), was designed originally for ChIPchip data and fits a PWM model by attempting to maximize the Spearman rank correlation between binding affinities and the microarray intensities using hillclimbing optimization. Another, Seed and Wobble (Berger et al., 2006), was designed for protein binding microarray (PBM) data (Mukherjee et al., 2004) and calculates an enrichment score similar to a MannWhitney U statistic for each 8mer and gapped 8mer. A PWM model is constructed by combining the enrichment scores of the best ‘seed’ 8mer and those of single base ‘wobbles’ around the seed using a Boltzmann distribution. These and other algorithms based on intensity rankings may be sensitive to noise in the microarray intensity levels that can translate into large changes in the relative rankings of probes with similar TFbinding affinities.
Here we introduce a new motif finding algorithm, RankMotif++, which uses the microarray data to infer relative binding preferences of the TF for pairs of probes and learns a motif model that is consistent with these preferences. By predicting binding preferences inferred from microarray intensities rather than the intensities themselves, RankMotif++ is applicable for a wide variety of relationships between the TFbinding affinity and semiquantitative readout of that affinity. Because it relies on inferred binding preferences rather than absolute rankings, RankMotif++ can be less sensitive to noise in probe intensity measurements. We evaluate our algorithm and test the general validity of the PWM model of binding affinity using data from a recent study by Berger et al. (2006).
Berger et al. (2006) used PBMs to analyze five different TFs. These were each analyzed on two different array designs containing independent de Bruijn sequences, such that each 10mer was represented once on each array, but embedded in a different 35base context from the other array. We have used these data to both gauge the reproducibility of the individual Kmerbinding affinities inferred from the PBM data for the TFs, as well as the accuracy of the PWM model derived from several stateoftheart approaches. The results of this analysis support the general validity of the PWM model, highlight some cases where it may or may not be advantageous to use individual Kmer scores, and also serve as a guide to the accuracy of genomic scans using either a PWM or a library of Kmers.
2 ALGORITHM
The RankMotif++ algorithm is based on a probabilistic model of binding preferences. The binding preferences of the TF can be inferred from the microarray intensity data using standard statistical techniques. In our model, the probability that one probe will be observed to be preferentially bound by the TF over another probe depends on the ratio of the affinities assigned to each probe sequence by the motif model. RankMotif++ fits PWM by searching for those that maximize the likelihood of the observed preference data under this probabilistic model.
2.1 Probabilistic model of binding preferences
Let a potentially noisy observation of the binding preferences of a TF be represented by a set of binary random variables , where N is the number of probes and X_{ij} = 1 if probe sequence i is observed to be preferred over sequence j. By definition we set X_{ij} = 1 − X_{ji}.
Let the outcome of a particular experimental assay, for example a PBM experiment be represented by a configuration of a subset of the binary random variables in X. Though, in general, a given experiment may result in an observation of only a subset of the possible preferences, to simplify the presentation of the model, we will assume that all possible preferences are observed. Later on we will describe how to relax this assumption. Without loss of generality, we will also assume for the time being that the probes are indexed so that if i > j then X_{ij} = 1.
Let s^{i} be the sequence associated with probe i and be the set of all probe sequences, and let Θ = {θ _{1}, θ _{2}, … , θ _{M}} represent the parameters of the motif model. Our model also includes a scaling factor that governs the effect that changes in predicted binding affinity have on the probability of observing a preference in the microarray data. The value w is also fit by maximizing the log likelihood.
Given Θ, w and S, our model assumes that the preferences are conditionally independent, so the log likelihood of Θ and w, , can be written as
We model the probability of observing a given preference P(X_{ij} = 1  Θ, w, S) as depending only upon the scaled log ratio of the binding affinities of the motif model for sequences s^{i} and s^{j}. If we set where g(s^{i}, Θ ) is the binding affinity of motif model Θ for s^{i}, then we write this probability as:
where σ (x) = 1/(1+e^{−x}) is the wellknown logistic sigmoid function. Note that we can rewrite equation (1) asIn the following when the values of Θ, w and S are obvious, we will use the notation z_{ij} to represent P(X_{ij} = 1 Θ, w, S).
2.2 Maximum likelihood estimation of the motif model
We maximize the likelihood of the binding model with respect to Θ and w using conjugate gradient descent (Press et al., 2002) on the log likelihood of the binding data. In this section, we describe how the gradient of the log likelihood is calculated.
Since X_{ij} = 1−X_{ji}, we can write
and thus Further noting that and that after taking the derivative of equation (3) with respect to an element θ_{m} of Θ, we get which can be simplified towhere r_{i} = 1+∑ _{j ≠ i}X_{ij} is simply the rank of probe i in a list of probe intensities sorted in increasing order, and can be interpreted as the expected rank of i under the binding preference model. So the partial derivative of the cost function with respect to θ_{m} depends upon the difference between the actual and expected rank of probe i, weighted by the derivative of f(s^{i},Θ ) with respect to θ_{m}. Similarly, the partial derivative with respect to w is
2.3 Definition of the motif model and composition function
None of the above derivation depends on the particular form of our motif model Θ or our method of estimating the binding affinity f (· , · ). As such, our binding preference model can easily be generalized to more complex motif models, so long as exists for all elements θ_{m} of Θ.
To investigate the accuracy of the PWM model and allow direct comparison against other PWMbased motif finding algorithms, we take Θ as representing a PWM, whose (b,k)^{th}element, Θ _{bk}, is the weight of base at position k in the binding site. Each column, Θ_{k}, of the PWM can be used to define a multinomial distribution over D as follows:
and the probability of an arbitrary Kmer s under the model is where s_{k} is the kth element of s.A given sequence can contain multiple Kmers that are good matches to the PWM. A number of methods have been proposed to combine the probabilities assigned to the individual Kmers within a sequence into a composite measurement of the binding affinity of the PWM for the whole sequence. We use a combination mechanism similar to the GOMER (Granek and Clarke, 2005) scoring function. In particular, we define to be the probability under the PWM that the TF binds to at least one of the Kmers that are subsequences of s^{i}, i.e.
where L^{i} is the length of sequence s^{i}, and is the subsequence of s^{i} that starts at the (t+1)th element and ends at the (t+K)th element inclusive. Assuming that all the Kmers in s^{i} can be bound independently, g(s^{i}, Θ ) is the probability that at least one of them will be bound. Our combination mechanism differs from GOMER in how we calculate the probability of binding to a given Kmer using the PWM. We made this change in order to ensure that our Kmer probability function had continuous partial derivatives.The partial derivatives of f(s^{i},Θ ) with respect to Θ _{bk} are:
where δ _{st+k,b} = 1 if s_{t+k} = b, otherwise δ _{st+k,b} = 0. Inserting equation (10) into equation (5) gives the partial derivatives necessary to complete the computation of the gradient of L(Θ, w).2.4 Partial observations of binding preferences
When the difference (or ratio) between the observed intensities of two probes is within the noise level of the assay, we treat the TFbinding preference between those two probes as being unobserved. These unobserved preferences are easily handled in the binding preference model. Because the observed preferences are conditionally independent given the parameters of the model and the probe sequences, unobserved preferences can simply be marginalized out of the likelihood. The only change that needs to be made to the model is in the definition of r_{i} and in equation (6). If T is the set of probe pairs (i, j) for which binding preferences have been observed, then the updated definition of r_{i} and are
Note that if (i,j) ∈ T then (j,i) ∈ T.3 METHODS
3.1 PBM intensity data
We downloaded the PBM intensity measures from the Supplementary Material section of Berger et al. (2006). These data contain measurements from two different universal microarrays for each of the five transcription factors Cbf1, Ceh22, Oct1, Rap1 and Zif268.
We normalized the microarray intensity data in two steps. We first translated the microarray intensities by adding a constant to each measurement so that the minimum intensity was equal to one. We then logtransformed the translated data. We call these logintensities the ‘normalized intensities’ and use y_{i} to denote the normalized intensity of probe i.
3.2 Deriving proteinbinding preferences from normalized intensities
Because microarray intensity data is noisy, the relative probe intensity levels are not always a reliable indicator of transcription factor binding preferences. As such, to define a reliable set of proteinbinding preferences, we separately estimated the intensity noise level for each array and only predict a binding preference when one of the probes has a significantly higher normalized intensity than the other.
Typically in microarraybased assays of transcription factor binding affinity, only a small proportion of probes detect sequencespecific binding. As such, we can estimate the SD of the noise in the normalized intensity of nonspecific binding using a robust estimate of the SD, σ, of the entire distribution of normalized intensity levels. In our experience, σ is also an empirical upper bound on the SD of the normalized intensity noise affecting probes detecting sequencespecific binding. As such, in the following, we will use σ as an estimate of the SD of the noise affecting all probes.
Our robust estimate is calculated by computing the median absolute deviation (MAD) of the distribution of normalized intensities and setting σ to the MAD divided by 0.6745 (the MAD of the unit normal distribution).
We define the set of positive (i.e. bound) probes as those whose normalized intensity , where m_{y} is the median of {y_{1}, y_{2}, … , y_{N}}. These are probes whose normalized intensity we deem to be significantly different, given the noise level σ, from the distribution of intensities of probes detecting nonspecific binding. The proportion of positives varies between 0.9 and 5.4% among the ten arrays. Probes that are not positives are called negatives.
We set X_{ij} = 1 if and only if y_{i} has significantly higher normalized intensity than y_{j} given σ. We represent that condition as . If by the former condition neither X_{ij} = 1 or X_{ji} = 1, i.e. , then we deem the intensity difference between the two probes to be within the noise level of the array. In this case, we represent X_{ij} as being unobserved and thus do not try to predict the binding preference between these two probes.
3.3 Fitting RankMotif++ motif models
We use all the positives and a random subset of the negatives to learn the motif model in RankMotif++. The negative subset is a random subsampling of all negatives with size of 800.
Like many other motif finders, the log likelihood optimized by RankMotif++ is not convex, so different initializations can generate different results. For the experiments reported here, we used three different initialization for each motif length and we found that these three were enough to locate good local optima.
To initialize RankMotif++ for motifs of length seven, we identified the three 7mers most associated high intensity probes. We used the Wilcoxon–Mann–Whitney test on each 7mer to score the distribution of intensities of the probes that did contain the 7mer against the distribution of those that did not. From the three 7mers with the lowest Pvalues, we generate three initial position frequency matrices to seed the motif search. For each of these 7mers, we initialize a PFM by putting 0.7 for the consensus base in each column and 0.1 for the nonconsensus base. We transform the PFM to a PWM by taking the log of each entry. In some but not all cases, the final PWM that our optimization procedure converges on has a similar consensus to the initial PWM.
To initialize the search for motifs of length K > 7, we generate two PFMs derived from the highest scoring RankMotif++ PFM of length K − 1 by adding a column of 0.25 on either side. We also generate one random PWM with weights uniformly distributed between −0.05 and 0.05. Occasionally the search starting from the randomly initialized PWM leads to the highest scoring PWM.
We computed motifs with widths from 7 to 13, and we used the motif model with the highest log likelihood for the test array evaluations.
3.4 Fitting the MatrixREDUCE motif models
We used MatrixREDUCE to generate one motif of length range between 7 and 13 by setting parameters between 7 and 13 and setting to 1. In computing this one motif, MatrixREDUCE considers a number of initial seeds and uses the highest scoring one among them, and it automatically chooses the best motif width among the range input. The transcription factors in the PBM study are unlikely to bind as homodimers so we set and to 0. We use the default settings for the other parameters.
We applied MatrixREDUCE to both the PBM intensity data directly and to the normalized intensities of the positive and negative probe sets used by RankMotif++. The overall performance of MatrixREDUCE on the benchmarks was similar for the two different inputs. The PWMs in Figure 1 are generated from raw PBM intensities but for consistency with other methods, we benchmark MatrixREDUCE on the normalized intensities inputs.
3.5 Fitting the MDScan motif models
We also vary the MDScan motif widths w between 7 and 13. MDScan uses a ‘top’ set and a refinement set to fit its motif model. We set the size of the top set to 20 which is the largest size used in Liu et al. (2002). We set the refinement set size to be equal to the number of positive probes on each array, and we use the default settings for all the other parameters. For each length, we only evaluate the motif model assigned the top score by MDScan.
The motif scores provided by MDScan always increased for the shorter motifs. As such, we used an alternative method for determining the optimal motif width. Using the top scoring motif model for each width, we scored each positive probe sequence of the training array using equation (9), and then we selected the motif model whose sequence scores had the highest Spearman correlation with the normalized intensities for the positive probes.
3.6 Fitting the PREGO motif models
The input probe sets for PREGO were the same as those used by RankMotif++.
We set the PREGO parameters to 7 and to 14. With these settings, PREGO reports the highest scoring motif of length range between 7 and 13. In order to force PREGO to run on our normalized intensities, we set to 1. We use the default settings for the other parameters.
3.7 Fitting the Seed and Wobble models
We were supplied with Seed and Wobble PWMs by Anthony Philippakis. We asked that these PWMs be fit separately to each array. For one of the arrays, Ceh22 Array 1, he supplied us with two slightly different PWM models.
3.8 Fitting the fully specified 8mer motif model
Each 8mer is associated with the median normalized intensity of the probes in which it appears in the training array.
3.9 Scoring test set probe sequences under each motif model
Note that all the motif models were only fit to training arrays, here we describe how these motif models were used to score the probe sequences on the test arrays.

8mer: The score assigned to a test set probe is simply the maximum of training array median intensities associated with the 8mers present in the probe.

RankMotif++, MDScan, PREGO: We use equation (9) where s is the test probe sequence and Θ is the motif model fit to the training array.

Seed and Wobble: We use equation (9) as above except for the two Ceh22 Array 1 PWMs. For Ceh22 Array 1, we scored each probe separately under the two PWMs [using equation (9)] and then set the composite score for the probe to be s = 1 − (1−s_{1})(1−s_{2}) where s_{1} and s_{2} were its scores under the two PWMs respectively.

MatrixREDUCE: The MatrixREDUCE motif model is neither a position weight matrix nor a position frequency matrix. As such, we use the MatrixREDUCE scoring function , where θ is the MatrixREDUCE motif model, s^{i} is the test probe sequence, L^{i} is the length of s^{i} and is the (t+K)th element of sequence s^{i}.
3.10 Computing performance metrics
To test the generalization ability of each of the motif models, we train each model on intensity data from one of the array designs and tested generalization performance on the other. Each design was used once as a training array and once as a test array.
We score each model's ability to correctly identify and rank the positive probes (as previously defined) in the test array using the methods as described in Section 3.9.
For each TF and each method, we calculate the Spearman correlation between the sequence scores and the intensities for the positive probes, which measures how well each motif model can reproduce the rankings of the intensities of the positives.
By thresholding the sequence scores at a given threshold, we can make predictions of which test set probes will be positive. For each threshold, we can calculate the true positive rate (, also called the recall or the sensitivity), the false positive rate () and the precision (). By decreasing the sequence score threshold and computing the true and false positive rates and the precision at these decreasing thresholds, we plot the ROC curves and the precision/recall curves shown in Figure 2.
4 RESULTS
Figure 1 shows the PWMs produced by each method for the five different transcription factors on each of the two array designs using sequence logos generated using enoLogo (Workman et al., 2005). There is a general overlap in the consensus sequences of the PWM but also consistent differences between the PWMs from different methods. In general, MatrixREDUCE and RankMotif++ produce more degenerate PWMs than MDScan. Seed and Wobble motifs are almost always the widest and the PREGO motifs are always between 7 and 8 columns. In general, PWMs from the same method run on data from different arrays are more similar than PWMs from different methods run on the same array data.
To evaluate whether these differences in PWMs would lead to differences in genomewide predictions of TF binding, we evaluated the accuracy with which each PWM could reproduce the ranking of probe intensities based on the probe sequence.
Each probe sequence was scored by each PWM, and the rankings of the probe scores were compared to the rankings of the measured probe intensities. In order to judge the validity of the positionindependent model, we included a fully specified model of 8mer TFbinding preferences in our evaluation. To ensure that the performance measures were not influenced by probes not bound by the TF, we defined a set of ‘positive’ probes that had intensities at least four SDs above the median intensity. For each PWM, we determined how highly the positive probes ranked in the score rankings (Table 1 and Figure 2) and how well the scoring ranking reproduced the observed intensity rankings (Table 2).
Table 1 shows the Spearman correlation coefficients between the probe scores and the probe intensities for each of the six methods. All of these correlations are statistically significant under a onesided test, the largest Pvalue of any element in the table is less than 10^{−5} and only four Pvalues were larger than 10^{−10}. RankMotif++ has the highest correlation among the PWMs in eight out of the ten conditions with MatrixREDUCE and Seed and Wobble having the highest correlation for both Oct1 arrays. The low MDScan correlations are likely due to the lack of degeneracy of the MDScan PWMs: between 50 and 75% of positive probes do not have any hits to the MDScan PWMs and are thus assigned the same score as most of the nonpositive probes. The degeneracy of the MDScan model is governed by one of the algorithm's parameters. Though we set this parameter to the highest level used in Liu et al. (2002), it is possible that the performance of MDScan would improve with a different setting.
Table 2 shows the sensitivity (true positive rate) of each PWM at 99% specificity (1% false positive rate). RankMotif++ has highest sensitivity in seven out of the ten conditions. At this specificity level, the MDScan PWMs still have hits in the positive probes. Figure 2 provides a more detailed view of the predictive performance of each method on the positive probes from the array design # 1.
RankMotif++ recovers probe rankings better any of the other four methods though does not perform quite as well as the fully specified 8mer model. However, RankMotif++ has higher correlations and higher sensitivities than the 8mer model in three of the ten conditions, particularly for Rap1 which has binding profiles with more than eight conserved bases. On Cbf1 and Ceh22, 8mer median and RankMotif++ have similar correlations and sensitivities. Though the 8mer median performance is better on Oct1 and Zif268.
5 DISCUSSION
We have made an investigation into the general validity of the PWM model of transcription factor DNA binding, and in the same analysis introduced a new method for learning PWMs. As a gold standard, we used a fully specified model of 8mer binding preferences.
Surprisingly, the predictive accuracy of the 8mer model at predicting the 35mer probe intensities was not as high as we had expected. Even for Cbf1, which has a core motif of six bases, a value of 50% recall is associated with only 60% precision by 8mer scores. Spearman correlations among the highest intensity spots (positives) are also far from absolute, suggesting that our stringent definition of positives is not causing a threshold effect. This phenomenon could represent a performance limit imposed by the reproducibility or noise level of the assay or the way the 8mer scores are calculated, or an effect of sequence or structural properties of flanking bases in the 35mer probe. Indeed Berger et al. (2006) report an effect of both the position of the consensus on the probe and its sequence context on the intensities of probes containing the Zif268 consensus. However, they also report that 8mer median intensities calculated from different arrays are highly reproducible. In our hands, the Spearman correlations between the highest intensity 8mer scores calculated separately on each array average 0.7593 over the five TFs. However, because the 8mer scores each are calculated from 16 separate 35mer intensity measurements, this correlation may represent an overestimate of their performance on a genomewide scan.
By several criteria, our method, RankMotif++, performed favorably in comparison to currently used tools. In some circumstances, RankMotif++ even performed better than the 8mer model, for example in the case of Rap1. We note that the binding profile of Rap1 is greater than 11 bp, so the PWM models are inferring the binding affinities of Rap1 to 12 and 13mers not represented in the training array and thus cannot be fully captured by the 8mer model. On the whole, PWMs learned by RankMotif++ on PBM microarray array data are more accurate predictors of transcription factor binding preferences than those learned by two widely used motif finders for use with semiquantitative data: MatrixREDUCE and MDScan and two new algorithms that use intensity rankings: PREGO and Seed and Wobble. We suggest that these differences in performance are due differences in assumptions that each method makes about the relationship between transcription binding affinity and semiquantitative readout and differences in each method's sensitivity to intensity noise.
ACKNOWLEDGEMENTS
We would like to acknowledge helpful discussions with Anthony Philippakis, Mike Berger and Martha Bulyk and thank Anthony for supplying us with Seed and Wobble PWMs. X.C. was supported by a grant from Genome Canada through the Ontario Genomics Institute. This work was supported in part by an NSERC operating grant to Quaid Morris.
Conflict of Interest: none declared.
Comments