MixTwice: large-scale hypothesis testing for peptide arrays by variance mixing

Abstract Summary Peptide microarrays have emerged as a powerful technology in immunoproteomics as they provide a tool to measure the abundance of different antibodies in patient serum samples. The high dimensionality and small sample size of many experiments challenge conventional statistical approaches, including those aiming to control the false discovery rate (FDR). Motivated by limitations in reproducibility and power of current methods, we advance an empirical Bayesian tool that computes local FDR statistics and local false sign rate statistics when provided with data on estimated effects and estimated standard errors from all the measured peptides. As the name suggests, the MixTwice tool involves the estimation of two mixing distributions, one on underlying effects and one on underlying variance parameters. Constrained optimization techniques provide for model fitting of mixing distributions under weak shape constraints (unimodality of the effect distribution). Numerical experiments show that MixTwice can accurately estimate generative parameters and powerfully identify non-null peptides. In a peptide array study of rheumatoid arthritis, MixTwice recovers meaningful peptide markers in one case where the signal is weak, and has strong reproducibility properties in one case where the signal is strong. Availabilityand implementation MixTwice is available as an R software package https://cran.r-project.org/web/packages/MixTwice/. Supplementary information Supplementary data are available at Bioinformatics online.

the remaining L components are for variance mixing probabilities h = (h l ).
The quantities c i,k,l depend on the data, the support points but not the probabilities g and h.
Constraints are critical to the optimization; of course all elements of g and h must be positive and sum to unity. We also impose a unimodality constraint on g. But in deploying the augmented Lagrangian method, these constraints act on the differentiable function l(g, h), which we consider initially as varying freely over 2K + L + 1 Euclidean space. The gradient of l(g, h) is a column vector of length (2K + 1) + L with the following format: where each component has the explicit form: The Hessian of l(g, h) is a (2K + 1) + L by (2K + 1) + L matrix: where matrix A (2K + 1 by 2K + 1) contains second derivative with respect to g, matrix C (L by L) contains second derivative with respect to h and matrix B (2K + 1 by L) contains second derivative with respect to g and h.
For entries of matrix A: For entries of matrix C: For entries of matrix B:

Random subsampling
The optimization to computeĝ andĥ becomes computationally challenging as the number of testing units increases. MixTwice provides an option for users to use a randomly-selected subset of testing units to obtain the fitted distributions. Here we illustrate the compute-time improvements associated with relatively little degradation in the quality of the estimates.
We use the CCP+RF+ RA example to illustrate the random subsampling properties in terms of estimation error and computational benefit. Relative to the estimate obtained from half the units, we evaluate the discrepancy in distribution estimation and the user's CPU time (with Inter(R) Core(TM) i5-7400HQ CPU processor) when the prop, the proportion of testing units used to fit the distribution, changes. We use 1-Wasserstein distance between two cumulative distribution functions as the metric to evaluate the discrepancy from the case when prop = 0.5 as benchmark. Figure  Panel C shows the computational benefits.

On identifiability
On units i with a fixed, known standard error σ, the mixing model for effects θ i is puts point mass at 0, with probability π 0 , and distributes the remaining mass according to some distribution g alt , which in the following is treated as a density function with respect to Lebesgue measure. Ignoring mixing over σ (according to h), the predictive density of estimatorθ i , at argument x, is where φ is the standard normal density reflecting Gaussian errors of the estimators. The alternative effect density g alt need not be zero in neighborhoods of the null, in which case it may happen that there exists a gap c = c σ > 0 for which, for all x, Supplementary Figure S1: How does random subsampling influence estimation accuracy and computational efficiency? Panel A shows the estimation inĝ,ĥ when various proportions of the units are used for estimation. Panel B shows the 1-Wasserstein discrepancy (between estimate at that proportion and estimate from half the units) as a function of subsampling proportion. Panel C shows the corresponding CPU time.
If there is a gap, the alternative predictive density contains within it a shrunken version of the null. The problem with such a gap is well known; an amount c σ (1 − π 0 ) of mass from the alternative predictive component may be pushed into the null component, with no effect on the marginal predictive density. A small gap emerges in cases such as spiky (Figure 1, main) where g alt concentrates substantial mass near the null value. This constitutes an identifiability issue, however, we find that the gap is small or nonexistent in many cases, and anyway can be shown to converge to zero when σ converges to zero. To see this feature, rearrange (1) to see that for all x we require The bound on the right depends on x; differentiating in x, under the integral gives θ σ 2 exp 1 2σ 2 (2θx − θ 2 ) g alt (θ) dθ.
Notice that if g alt is symmetric, then at x = 0 this derivative is zero, and so c σ ≤ exp −θ 2 2σ 2 g alt (θ) dθ.
Taking appropriate limits in σ towards 0 shows that c σ must vanish, which will happen with increasing amounts of information per unit. The question of mixing over σ using the second mixing distribution h is not directly addressed by the above computations. However, we would predict from them that as the estimated mixing distributionĥ concentrates more of its mass on small standard errors, then inferences about effects θ i will be ever more reliable.