Fast approximate inference for multivariate longitudinal data

Summary Collecting information on multiple longitudinal outcomes is increasingly common in many clinical settings. In many cases, it is desirable to model these outcomes jointly. However, in large data sets, with many outcomes, computational burden often prevents the simultaneous modeling of multiple outcomes within a single model. We develop a mean field variational Bayes algorithm, to jointly model multiple Gaussian, Poisson, or binary longitudinal markers within a multivariate generalized linear mixed model. Through simulation studies and clinical applications (in the fields of sight threatening diabetic retinopathy and primary biliary cirrhosis), we demonstrate substantial computational savings of our approximate approach when compared to a standard Markov Chain Monte Carlo, while maintaining good levels of accuracy of model parameters.

1.3 Derivation of q * (a k ) We require for each k = 1, . . . , q, E −a k [log p(y, θ)] = E −a k [log p(Σ R |a 1 , . . . , a q )] + E −a k [log p(a k )] + const We require When all of the longitudinal markers are continuous (and assumed Gaussian), it is relatively trivial to show the q * (β, u) is multivariate normal. However, once any of the longitudinal markers are binary or counts (assumed to be Poisson), we can no longer obtain a nice distribution for q * (β, u).
Instead we take the approach of Rohde and Wand (2016) and specify that in order to gain tractibility. By result 2 of Rohde and Wand (2016), the necessary updates for µ q (β,u) and Σ q(β,u) are where D x f is the derivate vector of f with respect to x and H x f is the Hessian matrix of f , the second derivative with respect to x, and in our case. See Rohde and Wand (2016) for a more general definition of non-entropy. In the following, we will make use of a change of variable. Note that by our specification,Cν ∼ N (Cµ q(β,u) , CΣ q(β,u) C T ) = N (µ, σ 2 ). Then we define a new variable, x = (Cν − µ)/σ, and we have that x ∼ N (0, 1). Then, from 1.2, and using rules 3.3.6 and 3.3.2 of Wand (2002) we have that, Hence the Rohde and Wand updates for q * (β, u) in the MGLMM defined in our paper are given by It remains to define the cumulant functions and derivatives b(·), b ′ (·) and b ′′ (·) for each of Gaussian, Poisson, and binary longitudinal markers. Note that the derivative b ′ (σx + µ) is taken with respect to (σx + µ). The chain rule is applied in the overall notation.
If Y r is a Gaussian marker, then b(σx+µ) = 1/2(σx+µ) 2 , and so E If Y r is a binary marker, then b(σx + µ) = log(1 + e σx+µ ). Defining expit(x) = logit −1 (x) = and with the help of integration by parts, Following Nolan and Wand (2017) we avoid quadrature routines by approximating B 0 (µ, σ 2 ) and B 1 (µ, σ 2 ) using a weighted mixture of normal distribution using the Monahan-Stefanski weights with k = 8 mixture components. Nolan and Wand (2017) state that In our case then the necessary MFVB updates are given by The updates for each marker can be substituted back into the Rohde and Wand updates using the relevant rows for each marker.

Streamlining
The updates for Σ q(β,u) involve calculating the inverse of a large matrix. However, this matrix has a block sparse structure, and we can utilise this structure, and matrix algebra results, in order to streamline our algorithm, and make substantial speed gains. This streamlining procedure is outlined in more detail in Lee and Wand (2016), and we just give the relevant details for our model here. The inverse of Σ q(β,u) is arranged as follows.
Using corollary 8.5.12 of Harville (2008), the inverse can be found as The updates involving Σ q(β,u) can now be streamlined which largely involves many calculations being performed only on blocks of the data containing the rows for a particular individual and so substantially improves computation time. Specifically, simple algebraic manipulations result in the updates for µ q(β,u) and B q(σ 2 ǫr ) given in Algorithm 1 of our main manuscript. We can also streamline the calculation of 1/2 log |Σ q(β,u) | which occurs in the marginal log lower bound log p(y, q), using Theorem 13.3.8 of Harville (2008).

Derivation of a lower bound on the marginal log-likelihood
We control the convergence of our MFVB algorithm through monitoring the bound on the marginal log-likelihood, log p(y, q). This is calculated as follows.
Here, we use y G , y P and y B to denote all the stacked Gaussian, Poisson and binary longitudinal observations respectively. The number of observations of each type of marker are denoted by n G , n P and n B respectively. We also make use of matrices E 1 , E 2 and E 3 as n × n diagonal matrices with value 1 for rows corresponding to Gaussian, Poisson and binary markers respectively and 0 elsewhere. We now provide an expression for each part of log p(y, q) and then state the full expression.

Additional simulation details
The final part of this supplement presents some accuracy results from the simulation study.