MDPbiome: microbiome engineering through prescriptive perturbations

Abstract Motivation Recent microbiome dynamics studies highlight the current inability to predict the effects of external perturbations on complex microbial populations. To do so would be particularly advantageous in fields such as medicine, bioremediation or industrial scenarios. Results MDPbiome statistically models longitudinal metagenomics samples undergoing perturbations as a Markov Decision Process (MDP). Given a starting microbial composition, our MDPbiome system suggests the sequence of external perturbation(s) that will engineer that microbiome to a goal state, for example, a healthier or more performant composition. It also estimates intermediate microbiome states along the path, thus making it possible to avoid particularly undesirable/unhealthy states. We demonstrate MDPbiome performance over three real and distinct datasets, proving its flexibility, and the reliability and universality of its output ‘optimal perturbation policy’. For example, an MDP created using a vaginal microbiome time series, with a goal of recovering from bacterial vaginosis, suggested avoidance of perturbations such as lubricants or sex toys; while another MDP provided a quantitative explanation for why salmonella vaccine accelerates gut microbiome maturation in chicks. This novel analytical approach has clear applications in medicine, where it could suggest low-impact clinical interventions that will lead to achievement or maintenance of a healthy microbial population, or alternately, the sequence of interventions necessary to avoid strongly negative microbiome states. Availability and implementation Code (https://github.com/beatrizgj/MDPbiome) and result files (https://tomdelarosa.shinyapps.io/MDPbiome/) are available online. Supplementary information Supplementary data are available at Bioinformatics online.


Markov Decision Process additional element
Sometimes, an additional element is also included in the MDP definition: γ ∈ [0, 1[; called the discount factor, which is a constant typically close to 1, used to indicate how important are future rewards compared to rewards obtained in the current state. We set a discount factor γ = 0.9 for future rewards, a typical value for this setting.

Datasets
Vaginal microbiome (Gajer et al., 2012): additional details Gajer et al. obtained the clusters in the vaginal microbiome dataset by hierarchical clustering, with Jensen-Shannon distance, with Ward linkage, cutting the dendogram with a k between 2 and 10, with the maximum silhouette inside this range. The maximum silhouette was at k =5, and thus they obtained 5 states. The actions were collected by a curated visual inspection of the individual profiles of the dynamics of vaginal bacterial communities, from the 32 D-panels (one profile per woman) of supplementary material in Figure S5 (Gajer et al., 2012) available on-line. We associated the external perturbation to the next sample taken, or the same day if it coincides, and this is then considered the 'action' between the two samples.
Although Brotman et al. (2014) used continuous-time Markov models to examine the same dataset (Gajer et al., 2012), their approach differs in the type of Markov model (not an MDP), in the prediction goal (infection with human papillomavirus rather than bacterial vaginosis) and chiefly in that they did not correlate actions/perturbations with state-transitions.

Results
Evaluating policy universality in vaginal microbiome Figure S1 shows the results of evaluating the generality of the MDPbiome policies in the vaginal microbiome dataset for the different perturbations. Figure S1 indicates that lubricant and sex toy policies are very general, with a large difference in going to equal or better states through following the policy versus not following it (see bottom of 4 th and 7 th pairs of columns, with shorter (F) and longer (nF) red bars). With respect to anal sex, oral sex and tampon-use, following the policy is better, while with digital penetration or vaginal intercourse the policy seems to not be universal among different subjects. In conclusion, the generality level of the policy depends on the microbiome dataset and the perturbation.  Figure S1: Frequency of categorized transitions when following or not the optimal policy, in vaginal microbiome. F: following the MDPbiome policy, nF: not following it. Better, equal and worse state-transition is defined considering inverse of the average Nugent score for sorting states.

Additional comparison with MDSINE
Here we define additional differences between MDPbiome and MDSINE, apart from those ones described in the main text. MDSINE requires qPCR and 16S data input, while MDPbiome only require the latter. MDSINE uses a generalized Lotka-Volterra model (gLV), where a transformation is applied (they called 'grading matching') to simplified the ordinal differential equation problem to a linear problem. Thus, given MDSINE gLV model, they developed additional algorithms to do predictions and evaluate their model. Similarly, given MDPbiome MDP model we obtain predictions (optimal policy) applying a MDP solver to our model, and we also develop: 1) our Dirichlet-based algorithm to evaluate our system, resulting in the definition of our optimal policy stability rate metric, individual per action and state, and in an aggregated way; and 2) a leave-one-out cross validation based algorithm resulting in our generality metric, that measures the frequency of following optimal policy compared to reach an equal, better or worse state. Regarding perturbations, MDPbiome suggests the perturbation to apply in a given state, while MDSINE detects if a perturbation (or interaction) occurred, comparing models with and without perturbations, or if the perturbation has effect or not.
In conclusion, we could not quantitative comparing both systems because their assumptions, model input and outputs, resulting in different and not-comparable metrics.