DIMet: an open-source tool for differential analysis of targeted isotope-labeled metabolomics data

Abstract Motivation Many diseases, such as cancer, are characterized by an alteration of cellular metabolism allowing cells to adapt to changes in the microenvironment. Stable isotope-resolved metabolomics (SIRM) and downstream data analyses are widely used techniques for unraveling cells’ metabolic activity to understand the altered functioning of metabolic pathways in the diseased state. While a number of bioinformatic solutions exist for the differential analysis of SIRM data, there is currently no available resource providing a comprehensive toolbox. Results In this work, we present DIMet, a one-stop comprehensive tool for differential analysis of targeted tracer data. DIMet accepts metabolite total abundances, isotopologue contributions, and isotopic mean enrichment, and supports differential comparison (pairwise and multi-group), time-series analyses, and labeling profile comparison. Moreover, it integrates transcriptomics and targeted metabolomics data through network-based metabolograms. We illustrate the use of DIMet in real SIRM datasets obtained from Glioblastoma P3 cell-line samples. DIMet is open-source, and is readily available for routine downstream analysis of isotope-labeled targeted metabolomics data, as it can be used both in the command line interface or as a complete toolkit in the public Galaxy Europe and Workfow4Metabolomics web platforms. Availability and implementation DIMet is freely available at https://github.com/cbib/DIMet, and through https://usegalaxy.eu and https://workflow4metabolomics.usegalaxy.fr. All the datasets are available at Zenodo https://zenodo.org/records/10925786.


S2 Expected directory structure to run DIMet from the command line
A specific folder structure is necessary for using DIMet in the command line.The Supplementary figure 2 details the required folder structure.
Supplementary Figure 2: Using DIMet in the command line requires to follow the predefined folder structure for your data and all the parameters that have to be indicated in the configuration files.Further details are provided in the Wiki page alongside downloadable templates.
The Supplementary figure 2 can be read as an instruction from top to bottom, detailing all the necessary files and their location.It covers the data itself, its metadata, the configuration files for the analyses to be run, etc.
Importantly, we provide starter templates for all these files downloadable from Zenodo to facilitate the use of DIMet.They can be easily modified according to your own data and analyses.The formats of each type of data file in the dataset subfolder are described in detail in the Wiki section data-files.

S3 Statistical analyses and their outputs
This section presents key concepts and recommendations regarding the statistical tests for comparing groups of samples with DIMet (Supplementary figure 1).The mathematical details, with few exceptions, are out of the scope of this document: we provide supplementary references which the reader can refer to.

S3.1 Univariate analyses
Goal of the analysis: exhibit differences between samples from 2 conditions, that is evaluate whether there is a significant increase or decrease in the abundances or the labeling of the metabolites between them.
Available univariate analyses: a. Pairwise differential analysis: comparing 2 groups of samples b.Time course analysis: comparing samples between consecutive time points c.Multi-group analysis: comparing samples from more than 2 groups General considerations for establishing the statistical significance.Targeted metabolomics data are not well suited for being analysed with parametric statistical tests 1 .Indeed, abundance values have the following characteristics: they take exclusively positive values, the distribution is not symmetrical, the variance is not homogeneous across distinct groups of values.All these elements indicate that abundance values correspond to highly skewed continuous distributions, such as e.g. the gamma distribution.Moreover, not all bell-shaped distributions are normal: for example, proportions follow a beta distribution 2 .Following these general considerations, the pairwise differential analysis and the time course analysis (a,b) share the same repertoire of tests to establish statistical significance, whereas the multi-group analysis (c) uses the Kruskal Wallis test.As a rule of thumb, we recommend considering the ranksum test as the data acquired in targeted metabolomics experiments often corresponds to its usage recommendation.It is also robust against outliers and heavy tail distributions.
The Supplementary table 1 shows the correspondences between type of analysis and statistical test.DIMet offers classical statistical tests from scipy.stats (ranksum, Wcox, MW, KW, BrMu and KW); additionally, DIMet implements disfit and prm-scipy, described in the same Supplementary table.

S3.2 Bi-variate analyses
Goal of the analysis: exhibit the correlation between samples from 2 variables, that is evaluate whether there is a significant linear relationship of the abundances or the labeling of the metabolites between them.
Available bi-variate analyses: a. MDV 3 profile comparison between 2 conditions b. MDV profile comparison between 2 consecutive time-points c.Metabolite time-course profile (of total abundances and mean enrichment fractional contributions) comparison between 2 conditions Specifics of the internal data processing for the bi-variate analysis.The following is performed for each metabolite (Supplementary table 2): In (a, b) the MDV profiles -obtained from the isotopolgue proportions-are compared between two conditions or two consecutive time-points, respectively.In (c), using the total metabolite abundances (or fractional contributions), the two sets of time-wise values between the two conditions are compared.The user can choose between the 1 When using a Parametric test (such as t-test), assumptions of normality, homoscedasticity (homogeneity of variance) and independence must be fulfilled.A normal distribution is bell shaped AND has a mean=0 AND negative and positive values symmetrically frequent.
2 Under certain parameters a beta distribution is bell-shaped and symmetrical with mean=0.5.Spearman or the Pearson correlation test.The Spearman test computes the correlation coefficients (ρ) based on the sum of the squared differences between the paired ranks, and the p-values are computed via the t-statistic for each ρ value.In the Pearson test, the correlation coefficients (r) are computed by linear regression, and the p-values are estimated via the t statistic.As a rule of thumb, the Spearman test is recommended, and this is the option set by default in DIMet.

Type of analysis
Null hypothesis Principle MDV profile comparison between 2 conditions The MDV profiles of the two conditions are not correlated Using the set of values of the isotopologue proportions that to the metabolite MDV, performs a linear regression between the set of values of the first condition and the set of values of the second condition.MDV profile comparison between 2 consecutive timepoints The MDV profiles of the two time-points are not correlated Same as above, but between the first and the second consecutive time-points.

Metabolite total abundances and fractional contribution time-course profile comparison between 2 conditions
The time-course profiles of the two conditions are not correlated Using the metabolite total abundance or fractional contribution, performs a linear regression between the set of values (matched across the time-points) of the first condition and the second condition.
Supplementary Table 2: The offered bi-variate analyses.Note that the set of values in each case is obtained by computing the geometric means across the biological replicates.The Spearman correlation test is run for each type of bi-variate analysis.

S3.3 Multiple tests correction methods
The correction for multiple tests is available for both univariate and bi-variate analyses using either Bonferroni ("bonferroni") or Benjamini-Hochberg ("fdr bh") correction.A general rule of thumb to use between the two methods is the following.If a stringent method is preferred and the inflation of false negatives is not a concern, Bonferroni method is recommended.In contrast, if the priority is to reduce the frequency of false negatives, the Benjamini-Hochberg method is recommended.The option set by default in DIMet is Benjamini-Hochberg ("fdr bh").For more information, see statsmodels.

S3.4 Output table of the DIMet univariate analyses
For each analysis DIMet generates tabular delimited files as output.Supplementary table 3 provides details on the content of the columns.

Note:
The distance/span (d/s) is the measure of the distance between two intervals of values (corresponding to the compared groups), normalized by the global span of all the values.This metric takes values between -1 to 1. Negative d/s values indicate an overlap, which means that this variable (metabolite) can not be a biomarker to distinguish between groups.Positive d/s values show the distance between intervals, the closer to 1, the greater.Groups that do not overlap reflect a reproducible difference and an indication that this variable is a potential biomarker.This must be interpreted alongside the adjusted p-values and the Fold Changes.The d/s is sensitive to outliers, if these are present, careful interpretation is required.

S3.5 Output table of the DIMet bi-variate analyses
The output table of the bi-variate analysis, performed with the chosen correlation test (Spearman or Pearson), contains the columns that are described in the Supplementary table 4.