-
PDF
- Split View
-
Views
-
Cite
Cite
Sin-Ho Jung, Sample size for FDR-control in microarray data analysis, Bioinformatics, Volume 21, Issue 14, July 2005, Pages 3097–3104, https://doi.org/10.1093/bioinformatics/bti456
Close -
Share
Abstract
Summary: We consider identifying differentially expressing genes between two patient groups using microarray experiment. We propose a sample size calculation method for a specified number of true rejections while controlling the false discovery rate at a desired level. Input parameters for the sample size calculation include the allocation proportion in each group, the number of genes in each array, the number of differentially expressing genes and the effect sizes among the differentially expressing genes. We have a closed-form sample size formula if the projected effect sizes are equal among differentially expressing genes. Otherwise, our method requires a numerical method to solve an equation. Simulation studies are conducted to show that the calculated sample sizes are accurate in practical settings. The proposed method is demonstrated with a realstudy.
Contact: jung005@mc.duke.edu
1 INTRODUCTION
Microarray method has been widely used for identifying differentially expressing genes, called prognostic genes, in the subjects with different types of disease. Statistical procedures to identify differentially expressing genes involve a serious multiple comparison problem since we perform as many hypothesis testings as the number of the candidate genes in microarrays. If we use a type I error rate α in each testing, then the probability to reject any hypothesis will greatly exceed the intended overall α level. In order to avoid this pitfall, two approaches are widely used: false discovery rate (FDR) control and family-wise error rate (FWER)control.
Sample size calculation is a critical procedure when designing a microarray study. There have been several publications on sample size estimation in the microarray context, e.g. Simon et al. (2002). Some focused on exploratory and approximate relationships among statistical power, sample size (or the number of replicates) and effect size (often, in terms of fold-change), and used the most conservative Bonferroni adjustment for controlling FWER (the probability to discover one or more genes when none of the genes under consideration is prognostic) without any attempt to incorporate the underlying correlation structure (Wolfinger et al., 2001; Black and Doerge, 2002; Pan et al., 2002; Cui and Churchill, 2003). Jung et al. (2005) incorporated the correlation structure to derive an accurate sample size when controlling the FWER.
Some researchers proposed a new concept of testing error called FDR, defined as the expected value of the proportion of the non-prognostic genes among the discovered genes (Benjamini and Hochberg, 1995; Storey, 2002). Controlling this quantity relaxes the multiple testing criteria compared with controlling the FWER in general, and consequently increases the number of declared significant genes. Operating and numerical characteristics of FDR are elucidated in recent publications (Genovese and Wasserman, 2002; Dudoit et al., 2003).
Lee and Whitmore (2002) considered multiple group cases, including the two-sample case, using ANOVA models and derived the relationship between the effect sizes and the FDR based on a Bayesian perspective. They discuss a power analysis without involving the multiple testing issue. Müller et al. (2004) chose a pair of testing errors, including FDR, and minimized one while controlling the other at a specified level using a Bayesian decision rule. They proposed a simulation algorithm to demonstrate the relationship between the sample size and the chosen testing errors based on asymptotic results. This approach requires specification of complicated parametric models for prior and data distributions, and extensive computing for the Bayesian simulations. Most of the existing studies for FDR-control do not show the explicit relationship between the sample size and the effect sizes because of different reasons. For example, Lee and Whitmore (2002) and Gadbury et al. (2004) modelled a distribution of p-values from pilot studies to produce sample size estimates but did not provide an explicit sample size formula. None of the aforementioned studies based on FDR evaluated their sample sizes using simulations.
In this paper, we propose a sample size estimation procedure for FDR-control. We derive the sample size required for a specified number of true rejections (i.e. identifying the prognostic genes) while controlling the FDR at a desired level. As input parameters, we specify the allocation proportions between two groups, the total number of candidate genes, the number of prognostic genes, the effect sizes of the prognostic genes in addition to the required number of true rejections and the FDR level. In general, our procedure requires solving an equation using a numerical method, such as the bisection method. However, if the effect sizes are equal among all prognostic genes, the equation can be solved to give a closed form formula. We review the background of FDR and its estimation method in Section 2, and propose a new sample size method in Section 3. In Section 4, we discuss simulation studies that are conducted to show that the calculated sample sizes are accurate, and demonstrate an application of our method to a real study. van den Oord and Sullivan (2003) considered a similar setting for sample size calculation, but their formulation is so general that they do not provide an explicit formula in any specific case.
2 FALSE DISCOVERY RATE
Suppose that we conduct m multiple tests, of which the null hypotheses are true for m0 tests and the alternative hypotheses are true for m1(=m − m0) tests. The tests declare that, of the m0 null hypotheses, A0 hypotheses are null (true negative) and R0 hypotheses are alternative (false rejection, false discovery or false positive). Among the m1 alternative hypotheses, A1 are declared null (false negative) and R0 are declared alternative (true rejection, true discovery or true positive). Table 1 summarizes the outcome of m hypothesis tests.
Benjamini and Hochberg (1995) propose a multi-step procedure to control the FDR at a specified level. However, this is known to be conservative, and the conservativeness increases in m0, see, e.g. Storey et al. (2004).
The independence assumption among m test statistics was relaxed to independence only among m0 test statistics corresponding to the null hypotheses by Storey and Tibshirani (2001) and to weak independence among all m test statistics by Storey and Tibshirani (2003) and Storey et al. (2004). These approaches are implemented in the statistical package called SAM (see Storey and Tibshirani, 2003).
3 SAMPLE SIZE CALCULATION
Let
At the design stage of a study, m is decided by the microarray chips chosen for experiment and m1,
Choose s1 and s2 such that 0 < s1 < s2 and h1h2 < 0, where hk = h(sk) for k = 1,2. (If h1h2 > 0 and h1 > 0, then choose a smaller s1; if h1h2 > 0 and h2 < 0, then choose a larger s2.)
For s3 = (s1 + s2)/2, calculate h3 = h(s3).
If h1h3 < 0, then replace s2 and h2 with s3 and h3, respectively. Else, replace s1 and h1 with s3 and h3, respectively. Go to (2).
Repeat (2) and (3) until |s1−s3| < 1 and |h3| < 1, and obtain the required sample size n = [s3] + 1, where [s] is the largest integer smaller than s.
In summary, our sample size calculation proceeds as follows:
Specify the input parameters:
f = FDR level;
r1 = number of true rejections;
ak = allocation proportion for group k(=1,2);
m = total number of genes for testing;
m1 = number of prognostic genes (m0 = m − m1);
- \(\{{\delta }_{j},j\in {\mathcal{M}}_{1}\}\)= effect sizes for prognostic genes.
Obtain the required sample size:
- If the effect sizes are constant δj = δ for\(j\in {\mathcal{M}}_{1}\),where α* = r1f/{m0(1 − f)} and β* = 1 − r1/m1.\[n=\left[\frac{{({z}_{\alpha *}+{z}_{\beta *})}^{2}}{{a}_{1}{a}_{2}{\delta }^{2}}\right]+1,\]
- Otherwise, solve h(n) = 0 using the bisection method, whereand α* = r1f/{m0(1−f)}.\[h\left(n\right)={\displaystyle \sum _{j\in {\mathcal{M}}_{1}}}\overline{\Phi }({z}_{\alpha *}-{\delta }_{j}\sqrt{n{a}_{1}{a}_{2}})-{r}_{1}\]
3.1 Two-sided tests
3.2 Exact formula based on t-distribution
4 NUMERICAL STUDIES
In order to investigate the accuracy of the proposed sample size formula, we conducted extensive simulation studies. We set m = 4000, m1 = 40 or 200, constant effect sizes δ = 0.5 or 1, and a1 = 0.5 or 0.7. We want r1 to be 30, 60 or 90% of m1 while controlling the FDR level at f = 1, 5 or 10% using one-sided p-values. Given a design setting, we first calculate the sample size n using formula (8),which is based on normal approximation, and then generate N = 5000 samples of size n from independent normal distributions under the same setting. From each simulation sample, the number of true rejections are counted while controlling the FDR at the specified level using the Storey's approach discussed in Section 2 with λ = 0.5. The first, second and third quartiles, Q1, Q2 and Q3, of the observed true rejections,
Figure 1 displays the empirical distribution of
5 DISCUSSION
Microarray has been a major high-throughput assay method to display DNA or RNA abundance for a large number of genes concurrently. Discovery of the prognostic genes should be made taking multiplicity into account, but also with enough statistical power to identify important genes successfully. Owing to the costly nature of microarray experiments, however, often only a small sample size is available and the resulting data analysis does not give reliable answers to the investigators. If the findings from a small study look promising, a large-scale study may be developed to confirm the findings using appropriate statistical tools. Our sample size formula will play the role in the design stage of such a confirmatory study. It can be used to check the statistical power, r1/m1, of a small-scale pilot study too.
The proposed method is to calculate the sample size for a specified number of true rejections (or the expected number of true rejections given a sample size) while controlling the FDR at a given level. The input variables to be pre-specified are total number of genes for testing m, projected number of prognostic genes m1, allocation proportions ak between groups and effect sizes for the prognostic genes. The method does not require any heavy computation, such as Monte Carlo simulations, so that we get a sample size in a second. Especially, if the effect sizes among the prognostic genes are the same, we have a closed form formula that can be calculated using a scientific calculator and a normal distribution table. The proposed method can be used to design a new study based on the parameter values estimated from the pilot data.
It is shown through simulations that the formula based on normal approximation works well overall, even when the expression levels are weakly correlated or have skewed distributions. If there exists dependency among the genes, the observed number of true rejections tends to have a wide variation around the nominal r1. The computer program for sample size calculation is available from the author.
Outcomes of m multiple tests
| True hypothesis . | Accepted hypothesis . | Total . | |
|---|---|---|---|
| . | Null . | Alternative . | . |
| Null | A 0 | R 0 | m 0 |
| Alternative | A 1 | R 1 | m 1 |
| Total | A | R | m |
| True hypothesis . | Accepted hypothesis . | Total . | |
|---|---|---|---|
| . | Null . | Alternative . | . |
| Null | A 0 | R 0 | m 0 |
| Alternative | A 1 | R 1 | m 1 |
| Total | A | R | m |
Outcomes of m multiple tests
| True hypothesis . | Accepted hypothesis . | Total . | |
|---|---|---|---|
| . | Null . | Alternative . | . |
| Null | A 0 | R 0 | m 0 |
| Alternative | A 1 | R 1 | m 1 |
| Total | A | R | m |
| True hypothesis . | Accepted hypothesis . | Total . | |
|---|---|---|---|
| . | Null . | Alternative . | . |
| Null | A 0 | R 0 | m 0 |
| Alternative | A 1 | R 1 | m 1 |
| Total | A | R | m |
The bisection procedure for Example 2
| Step . | s 1 . | s 2 . | s 3 . | h 1 . | h 2 . | h 3 . |
|---|---|---|---|---|---|---|
| 1 | 100.0 | 200.0 | 150.0 | −4.67 | 3.59 | 0.13 |
| 2 | 100.0 | 150.0 | 125.0 | −4.67 | 0.13 | −1.85 |
| 3 | 125.0 | 150.0 | 137.5 | −1.85 | 0.13 | −0.80 |
| 4 | 137.5 | 150.0 | 143.8 | −0.80 | 0.13 | −0.32 |
| 5 | 143.8 | 150.0 | 146.9 | −0.32 | 0.13 | −0.09 |
| 6 | 146.9 | 150.0 | 148.4 | −0.09 | 0.13 | 0.02 |
| 7 | 146.9 | 148.4 | 147.7 | −0.09 | 0.02 | −0.04 |
| Step . | s 1 . | s 2 . | s 3 . | h 1 . | h 2 . | h 3 . |
|---|---|---|---|---|---|---|
| 1 | 100.0 | 200.0 | 150.0 | −4.67 | 3.59 | 0.13 |
| 2 | 100.0 | 150.0 | 125.0 | −4.67 | 0.13 | −1.85 |
| 3 | 125.0 | 150.0 | 137.5 | −1.85 | 0.13 | −0.80 |
| 4 | 137.5 | 150.0 | 143.8 | −0.80 | 0.13 | −0.32 |
| 5 | 143.8 | 150.0 | 146.9 | −0.32 | 0.13 | −0.09 |
| 6 | 146.9 | 150.0 | 148.4 | −0.09 | 0.13 | 0.02 |
| 7 | 146.9 | 148.4 | 147.7 | −0.09 | 0.02 | −0.04 |
The bisection procedure for Example 2
| Step . | s 1 . | s 2 . | s 3 . | h 1 . | h 2 . | h 3 . |
|---|---|---|---|---|---|---|
| 1 | 100.0 | 200.0 | 150.0 | −4.67 | 3.59 | 0.13 |
| 2 | 100.0 | 150.0 | 125.0 | −4.67 | 0.13 | −1.85 |
| 3 | 125.0 | 150.0 | 137.5 | −1.85 | 0.13 | −0.80 |
| 4 | 137.5 | 150.0 | 143.8 | −0.80 | 0.13 | −0.32 |
| 5 | 143.8 | 150.0 | 146.9 | −0.32 | 0.13 | −0.09 |
| 6 | 146.9 | 150.0 | 148.4 | −0.09 | 0.13 | 0.02 |
| 7 | 146.9 | 148.4 | 147.7 | −0.09 | 0.02 | −0.04 |
| Step . | s 1 . | s 2 . | s 3 . | h 1 . | h 2 . | h 3 . |
|---|---|---|---|---|---|---|
| 1 | 100.0 | 200.0 | 150.0 | −4.67 | 3.59 | 0.13 |
| 2 | 100.0 | 150.0 | 125.0 | −4.67 | 0.13 | −1.85 |
| 3 | 125.0 | 150.0 | 137.5 | −1.85 | 0.13 | −0.80 |
| 4 | 137.5 | 150.0 | 143.8 | −0.80 | 0.13 | −0.32 |
| 5 | 143.8 | 150.0 | 146.9 | −0.32 | 0.13 | −0.09 |
| 6 | 146.9 | 150.0 | 148.4 | −0.09 | 0.13 | 0.02 |
| 7 | 146.9 | 148.4 | 147.7 | −0.09 | 0.02 | −0.04 |
Sample size n for r1 (=30, 60 or 90% of m1) true rejections at FDR = 1, 5 or 10% level by one-sided tests when m = 4000, m1 = 40 or 200, δ = 0.5 or 1, a1 = 0.5 or 0.7
| a 1 . | m 1 . | δ . | r 1 . | FDR = 1% . | 5% . | 10% . |
|---|---|---|---|---|---|---|
| 0.5 | 40 | 0.5 | 12 | 12 (9, 15)/195 | 12 (9, 14)/152 | 12 (8, 14)133 |
| 24 | 24 (22, 26)/269 | 24 (21, 26)/216 | 24 (21, 26)/192 | |||
| 36 | 36 (35, 37)/404 | 36 (35, 37)/337 | 36 (35, 37)/306 | |||
| 1 | 12 | 13 (10, 16)/49 | 13 (10, 16)/38 | 14 (11, 17)/34 | ||
| 24 | 25 (22, 27)/68 | 24 (22, 27)/54 | 24 (22, 27)/48 | |||
| 36 | 36 (35, 37)/101 | 36 (35, 37)/85 | 36 (35, 37)/77 | |||
| 200 | 0.5 | 60 | 62 (56, 68)/152 | 61 (55, 68)/110 | 62 (55, 69)/92 | |
| 120 | 121 (115, 126)/216 | 120 (114, 126)/163 | 121 (115, 127)/140 | |||
| 180 | 180 (177, 183)/337 | 180 (177, 183)/268 | 180 (177, 183)/236 | |||
| 1 | 60 | 67 (61, 73)/38 | 71 (64, 78)/28 | 72 (65, 78)/23 | ||
| 120 | 121 (115, 127)/54 | 122 (117, 128)/41 | 123 (117, 129)/35 | |||
| 180 | 180 (177, 183)/85 | 179 (176, 182)/67 | 180 (176, 183)/59 | |||
| 0.7 | 40 | 0.5 | 12 | 12 (9, 14)/232 | 11 (9, 14)/181 | 11 (8, 14)/158 |
| 24 | 24 (22, 26)/320 | 24 (21, 26)/257 | 24 (21, 26)/228 | |||
| 36 | 36 (35, 37)/481 | 36 (35, 37)/401 | 36 (35, 37)/364 | |||
| 1 | 12 | 13 (10, 15)/58 | 13 (10, 15)/46 | 14 (11, 16)/40 | ||
| 24 | 24 (22, 27)/80 | 24 (22, 27)/65 | 24 (22, 27)/57 | |||
| 36 | 36 (35, 37)/121 | 36 (35, 37)/101 | 36 (35, 37)/91 | |||
| 200 | 0.5 | 60 | 62 (55, 68)/181 | 61 (55, 68)/131 | 62 (55, 69)/110 | |
| 120 | 121 (115, 127)/257 | 120 (114, 126)/194 | 119 (114, 126)/166 | |||
| 180 | 180 (177, 183)/401 | 180 (177, 183)/319 | 180 (177, 183)/281 | |||
| 1 | 60 | 65 (59, 72)/46 | 64 (57, 70)/33 | 71 (65, 78)/28 | ||
| 120 | 122 (116, 128)/65 | 121 (114, 126)/49 | 122 (115, 128)/42 | |||
| 180 | 180 (177, 183)/101 | 180 (177, 183)/80 | 180 (177, 183)/71 |
| a 1 . | m 1 . | δ . | r 1 . | FDR = 1% . | 5% . | 10% . |
|---|---|---|---|---|---|---|
| 0.5 | 40 | 0.5 | 12 | 12 (9, 15)/195 | 12 (9, 14)/152 | 12 (8, 14)133 |
| 24 | 24 (22, 26)/269 | 24 (21, 26)/216 | 24 (21, 26)/192 | |||
| 36 | 36 (35, 37)/404 | 36 (35, 37)/337 | 36 (35, 37)/306 | |||
| 1 | 12 | 13 (10, 16)/49 | 13 (10, 16)/38 | 14 (11, 17)/34 | ||
| 24 | 25 (22, 27)/68 | 24 (22, 27)/54 | 24 (22, 27)/48 | |||
| 36 | 36 (35, 37)/101 | 36 (35, 37)/85 | 36 (35, 37)/77 | |||
| 200 | 0.5 | 60 | 62 (56, 68)/152 | 61 (55, 68)/110 | 62 (55, 69)/92 | |
| 120 | 121 (115, 126)/216 | 120 (114, 126)/163 | 121 (115, 127)/140 | |||
| 180 | 180 (177, 183)/337 | 180 (177, 183)/268 | 180 (177, 183)/236 | |||
| 1 | 60 | 67 (61, 73)/38 | 71 (64, 78)/28 | 72 (65, 78)/23 | ||
| 120 | 121 (115, 127)/54 | 122 (117, 128)/41 | 123 (117, 129)/35 | |||
| 180 | 180 (177, 183)/85 | 179 (176, 182)/67 | 180 (176, 183)/59 | |||
| 0.7 | 40 | 0.5 | 12 | 12 (9, 14)/232 | 11 (9, 14)/181 | 11 (8, 14)/158 |
| 24 | 24 (22, 26)/320 | 24 (21, 26)/257 | 24 (21, 26)/228 | |||
| 36 | 36 (35, 37)/481 | 36 (35, 37)/401 | 36 (35, 37)/364 | |||
| 1 | 12 | 13 (10, 15)/58 | 13 (10, 15)/46 | 14 (11, 16)/40 | ||
| 24 | 24 (22, 27)/80 | 24 (22, 27)/65 | 24 (22, 27)/57 | |||
| 36 | 36 (35, 37)/121 | 36 (35, 37)/101 | 36 (35, 37)/91 | |||
| 200 | 0.5 | 60 | 62 (55, 68)/181 | 61 (55, 68)/131 | 62 (55, 69)/110 | |
| 120 | 121 (115, 127)/257 | 120 (114, 126)/194 | 119 (114, 126)/166 | |||
| 180 | 180 (177, 183)/401 | 180 (177, 183)/319 | 180 (177, 183)/281 | |||
| 1 | 60 | 65 (59, 72)/46 | 64 (57, 70)/33 | 71 (65, 78)/28 | ||
| 120 | 122 (116, 128)/65 | 121 (114, 126)/49 | 122 (115, 128)/42 | |||
| 180 | 180 (177, 183)/101 | 180 (177, 183)/80 | 180 (177, 183)/71 |
Each cell consists of Q2(Q1,Q3)/n, where n is the required sample size, and Q1,Q2 and Q3 are the first, second and third, respectively, quartiles of the observed number of true rejections from 5000 simulations.
Sample size n for r1 (=30, 60 or 90% of m1) true rejections at FDR = 1, 5 or 10% level by one-sided tests when m = 4000, m1 = 40 or 200, δ = 0.5 or 1, a1 = 0.5 or 0.7
| a 1 . | m 1 . | δ . | r 1 . | FDR = 1% . | 5% . | 10% . |
|---|---|---|---|---|---|---|
| 0.5 | 40 | 0.5 | 12 | 12 (9, 15)/195 | 12 (9, 14)/152 | 12 (8, 14)133 |
| 24 | 24 (22, 26)/269 | 24 (21, 26)/216 | 24 (21, 26)/192 | |||
| 36 | 36 (35, 37)/404 | 36 (35, 37)/337 | 36 (35, 37)/306 | |||
| 1 | 12 | 13 (10, 16)/49 | 13 (10, 16)/38 | 14 (11, 17)/34 | ||
| 24 | 25 (22, 27)/68 | 24 (22, 27)/54 | 24 (22, 27)/48 | |||
| 36 | 36 (35, 37)/101 | 36 (35, 37)/85 | 36 (35, 37)/77 | |||
| 200 | 0.5 | 60 | 62 (56, 68)/152 | 61 (55, 68)/110 | 62 (55, 69)/92 | |
| 120 | 121 (115, 126)/216 | 120 (114, 126)/163 | 121 (115, 127)/140 | |||
| 180 | 180 (177, 183)/337 | 180 (177, 183)/268 | 180 (177, 183)/236 | |||
| 1 | 60 | 67 (61, 73)/38 | 71 (64, 78)/28 | 72 (65, 78)/23 | ||
| 120 | 121 (115, 127)/54 | 122 (117, 128)/41 | 123 (117, 129)/35 | |||
| 180 | 180 (177, 183)/85 | 179 (176, 182)/67 | 180 (176, 183)/59 | |||
| 0.7 | 40 | 0.5 | 12 | 12 (9, 14)/232 | 11 (9, 14)/181 | 11 (8, 14)/158 |
| 24 | 24 (22, 26)/320 | 24 (21, 26)/257 | 24 (21, 26)/228 | |||
| 36 | 36 (35, 37)/481 | 36 (35, 37)/401 | 36 (35, 37)/364 | |||
| 1 | 12 | 13 (10, 15)/58 | 13 (10, 15)/46 | 14 (11, 16)/40 | ||
| 24 | 24 (22, 27)/80 | 24 (22, 27)/65 | 24 (22, 27)/57 | |||
| 36 | 36 (35, 37)/121 | 36 (35, 37)/101 | 36 (35, 37)/91 | |||
| 200 | 0.5 | 60 | 62 (55, 68)/181 | 61 (55, 68)/131 | 62 (55, 69)/110 | |
| 120 | 121 (115, 127)/257 | 120 (114, 126)/194 | 119 (114, 126)/166 | |||
| 180 | 180 (177, 183)/401 | 180 (177, 183)/319 | 180 (177, 183)/281 | |||
| 1 | 60 | 65 (59, 72)/46 | 64 (57, 70)/33 | 71 (65, 78)/28 | ||
| 120 | 122 (116, 128)/65 | 121 (114, 126)/49 | 122 (115, 128)/42 | |||
| 180 | 180 (177, 183)/101 | 180 (177, 183)/80 | 180 (177, 183)/71 |
| a 1 . | m 1 . | δ . | r 1 . | FDR = 1% . | 5% . | 10% . |
|---|---|---|---|---|---|---|
| 0.5 | 40 | 0.5 | 12 | 12 (9, 15)/195 | 12 (9, 14)/152 | 12 (8, 14)133 |
| 24 | 24 (22, 26)/269 | 24 (21, 26)/216 | 24 (21, 26)/192 | |||
| 36 | 36 (35, 37)/404 | 36 (35, 37)/337 | 36 (35, 37)/306 | |||
| 1 | 12 | 13 (10, 16)/49 | 13 (10, 16)/38 | 14 (11, 17)/34 | ||
| 24 | 25 (22, 27)/68 | 24 (22, 27)/54 | 24 (22, 27)/48 | |||
| 36 | 36 (35, 37)/101 | 36 (35, 37)/85 | 36 (35, 37)/77 | |||
| 200 | 0.5 | 60 | 62 (56, 68)/152 | 61 (55, 68)/110 | 62 (55, 69)/92 | |
| 120 | 121 (115, 126)/216 | 120 (114, 126)/163 | 121 (115, 127)/140 | |||
| 180 | 180 (177, 183)/337 | 180 (177, 183)/268 | 180 (177, 183)/236 | |||
| 1 | 60 | 67 (61, 73)/38 | 71 (64, 78)/28 | 72 (65, 78)/23 | ||
| 120 | 121 (115, 127)/54 | 122 (117, 128)/41 | 123 (117, 129)/35 | |||
| 180 | 180 (177, 183)/85 | 179 (176, 182)/67 | 180 (176, 183)/59 | |||
| 0.7 | 40 | 0.5 | 12 | 12 (9, 14)/232 | 11 (9, 14)/181 | 11 (8, 14)/158 |
| 24 | 24 (22, 26)/320 | 24 (21, 26)/257 | 24 (21, 26)/228 | |||
| 36 | 36 (35, 37)/481 | 36 (35, 37)/401 | 36 (35, 37)/364 | |||
| 1 | 12 | 13 (10, 15)/58 | 13 (10, 15)/46 | 14 (11, 16)/40 | ||
| 24 | 24 (22, 27)/80 | 24 (22, 27)/65 | 24 (22, 27)/57 | |||
| 36 | 36 (35, 37)/121 | 36 (35, 37)/101 | 36 (35, 37)/91 | |||
| 200 | 0.5 | 60 | 62 (55, 68)/181 | 61 (55, 68)/131 | 62 (55, 69)/110 | |
| 120 | 121 (115, 127)/257 | 120 (114, 126)/194 | 119 (114, 126)/166 | |||
| 180 | 180 (177, 183)/401 | 180 (177, 183)/319 | 180 (177, 183)/281 | |||
| 1 | 60 | 65 (59, 72)/46 | 64 (57, 70)/33 | 71 (65, 78)/28 | ||
| 120 | 122 (116, 128)/65 | 121 (114, 126)/49 | 122 (115, 128)/42 | |||
| 180 | 180 (177, 183)/101 | 180 (177, 183)/80 | 180 (177, 183)/71 |
Each cell consists of Q2(Q1,Q3)/n, where n is the required sample size, and Q1,Q2 and Q3 are the first, second and third, respectively, quartiles of the observed number of true rejections from 5000 simulations.
Distribution of the observed number of true rejections,
Sample size n for r1 (=30 or 60% of m1) true rejections at FDR = 1, 5 or 10% level by two-sided P-values when m = 7000, m1 = 50 or 100, a1 = 0.7
| m 1 . | r 1/m1 . | FDR = 1% . | 5% . | 10% . |
|---|---|---|---|---|
| 50 | 0.3 | 39 | 32 | 29 |
| 0.6 | 58 | 47 | 42 | |
| 100 | 0.3 | 47 | 36 | 32 |
| 0.6 | 69 | 56 | 50 |
| m 1 . | r 1/m1 . | FDR = 1% . | 5% . | 10% . |
|---|---|---|---|---|
| 50 | 0.3 | 39 | 32 | 29 |
| 0.6 | 58 | 47 | 42 | |
| 100 | 0.3 | 47 | 36 | 32 |
| 0.6 | 69 | 56 | 50 |
The effect sizes are estimated from Golub et al. (1999) data.
Sample size n for r1 (=30 or 60% of m1) true rejections at FDR = 1, 5 or 10% level by two-sided P-values when m = 7000, m1 = 50 or 100, a1 = 0.7
| m 1 . | r 1/m1 . | FDR = 1% . | 5% . | 10% . |
|---|---|---|---|---|
| 50 | 0.3 | 39 | 32 | 29 |
| 0.6 | 58 | 47 | 42 | |
| 100 | 0.3 | 47 | 36 | 32 |
| 0.6 | 69 | 56 | 50 |
| m 1 . | r 1/m1 . | FDR = 1% . | 5% . | 10% . |
|---|---|---|---|---|
| 50 | 0.3 | 39 | 32 | 29 |
| 0.6 | 58 | 47 | 42 | |
| 100 | 0.3 | 47 | 36 | 32 |
| 0.6 | 69 | 56 | 50 |
The effect sizes are estimated from Golub et al. (1999) data.
Simulation results from normal or mixture of
| . | Normal . | \({\chi }_{2}^{2}\) -mixture
. | ||||
|---|---|---|---|---|---|---|
| r 1 . | FDR = 1% . | 5% . | 10% . | FDR = 1% . | 5% . | 10% . |
| 12 | 11 (6, 16) | 12 (6, 18) | 13 (7, 19) | 12 (7, 17) | 13 (8, 19) | 15 (9, 21) |
| 24 | 20 (16, 25) | 22 (17, 27) | 23 (17, 28) | 20 (15, 24) | 22 (17, 27) | 23 (18, 28) |
| . | Normal . | \({\chi }_{2}^{2}\) -mixture
. | ||||
|---|---|---|---|---|---|---|
| r 1 . | FDR = 1% . | 5% . | 10% . | FDR = 1% . | 5% . | 10% . |
| 12 | 11 (6, 16) | 12 (6, 18) | 13 (7, 19) | 12 (7, 17) | 13 (8, 19) | 15 (9, 21) |
| 24 | 20 (16, 25) | 22 (17, 27) | 23 (17, 28) | 20 (15, 24) | 22 (17, 27) | 23 (18, 28) |
Other parameters are set at (a1,m1,δ) = (0.5,40,1) and r1 = 12 or 24. Each cell consists of Q2(Q1,Q3), the quartiles of the observed number of true rejections from 5000 simulations. The sample sizes are given in Table 3 under the same setting for (a1,m1,δ,r1).
Simulation results from normal or mixture of
| . | Normal . | \({\chi }_{2}^{2}\) -mixture
. | ||||
|---|---|---|---|---|---|---|
| r 1 . | FDR = 1% . | 5% . | 10% . | FDR = 1% . | 5% . | 10% . |
| 12 | 11 (6, 16) | 12 (6, 18) | 13 (7, 19) | 12 (7, 17) | 13 (8, 19) | 15 (9, 21) |
| 24 | 20 (16, 25) | 22 (17, 27) | 23 (17, 28) | 20 (15, 24) | 22 (17, 27) | 23 (18, 28) |
| . | Normal . | \({\chi }_{2}^{2}\) -mixture
. | ||||
|---|---|---|---|---|---|---|
| r 1 . | FDR = 1% . | 5% . | 10% . | FDR = 1% . | 5% . | 10% . |
| 12 | 11 (6, 16) | 12 (6, 18) | 13 (7, 19) | 12 (7, 17) | 13 (8, 19) | 15 (9, 21) |
| 24 | 20 (16, 25) | 22 (17, 27) | 23 (17, 28) | 20 (15, 24) | 22 (17, 27) | 23 (18, 28) |
Other parameters are set at (a1,m1,δ) = (0.5,40,1) and r1 = 12 or 24. Each cell consists of Q2(Q1,Q3), the quartiles of the observed number of true rejections from 5000 simulations. The sample sizes are given in Table 3 under the same setting for (a1,m1,δ,r1).
The author wants to thank the two reviewers for their valuable comments.
REFERENCES
Benjamini, Y. and Hochberg, Y.
Benjamini, Y. and Yekutieli, D.
Black, M.A. and Doerge, R.W.
Cui, X. and Churchill, G.A.
Gadbury, G.L., et al.
Genovese, C. and Wasserman, L.
Golub, T.R., et al.
Jung, S.H., et al.
Lee, M.L.T. and Whitmore, G.A.
Müller, P., et al.
Pan, W., et al.
Storey, J.D.
Technical Report 2001-28 Storey, J.D. and Tibshirani, R.
Storey, J.D. and Tibshirani, R.
Storey, J.D., et al.
van den Oord, E.J.C.G. and Sullivan, P.F.
![Distribution of the observed number of true rejections, \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \({\widehat{r}}_{1}\) \end{document}, from 5000 simulations under (a1,m1,δ)=(0.5,40,0.5) and (r1,FDR) = (12,0.01) in (a); (12,0.1) in (b); (36,0.01) in (c); (36,0.1) in (d).](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/bioinformatics/21/14/10.1093_bioinformatics_bti456/3/m_bioinformatics_21_14_3097_f2.jpeg?Expires=1686307184&Signature=tcB-BIi3vABVQFbFcOJGKwvkZJ3S4PHuLb1DtzTSqvu0cV2PuY2SQ3K6OOsIz7V~0ZVyr7Bob7nOzxc2QjI~FtZM-XWCoKZ3ZiRRyjSbpS9JI4U8DFV40lXCgC~smqSQTMvr8N9RQdUtSew0exi6FmyOU4qOUXABF~JWcYUSzUDFBFELUijsqlf4JcrTspTcWY2qnOuSR4bv9ePvBGzOapg39WpTSUkQtPzHkzMrh5-MiqQ91CimURa8P5OKrs7weqwHU65y1PvSc1QLAp2rk3d4ERQMO4awxd2VaZdHfCGx0ECYsJMRg-CnrVD98maoaROeCu34R0Uw2~YDHBOvFQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)