TinderMIX: Time-dose integrated modelling of toxicogenomics data


 
 
 Omics technologies have been widely applied in toxicology studies to investigate the effects of different substances on exposed biological systems. A classical toxicogenomic study consists in testing the effects of a compound at different dose levels and different time points. The main challenge consists in identifying the gene alteration patterns that are correlated to doses and time points. The majority of existing methods for toxicogenomics data analysis allow the study of the molecular alteration after the exposure (or treatment) at each time point individually. However, this kind of analysis cannot identify dynamic (time-dependent) events of dose responsiveness.
 
 
 
 We propose TinderMIX, an approach that simultaneously models the effects of time and dose on the transcriptome to investigate the course of molecular alterations exerted in response to the exposure. Starting from gene log fold-change, TinderMIX fits different integrated time and dose models to each gene, selects the optimal one, and computes its time and dose effect map; then a user-selected threshold is applied to identify the responsive area on each map and verify whether the gene shows a dynamic (time-dependent) and dose-dependent response; eventually, responsive genes are labelled according to the integrated time and dose point of departure.
 
 
 
 To showcase the TinderMIX method, we analysed 2 drugs from the Open TG-GATEs dataset, namely, cyclosporin A and thioacetamide. We first identified the dynamic dose-dependent mechanism of action of each drug and compared them. Our analysis highlights that different time- and dose-integrated point of departure recapitulates the toxicity potential of the compounds as well as their dynamic dose-dependent mechanism of action.


Dear editor, please find enclosed a revised version of our manuscript. We addressed the comments raised by the reviewers. We would like to thank the reviewers for constructive feedback. We believe that this revised manuscript is significantly improved from the initial submission and we look forward to your positive feedback.
Yours Sincerely, Dario Greco PhD, Associate Professor, Faculty of Medicine and Health Technology Tampere University Finland ____________________________________________________________________________ Reviewer 1 General : In this paper, the authors propose a new approach, called TinderMIX, for simultaneously studying time and dose effects in toxicogenomics. Although many toxicogenomics experiments include data from different doses measured at different time points, only a few bioinformatics approaches to integrate time and dose effects exist, and their applications are limited to network identification and response modeling. The new method proposed by the authors extends existing methods by identifying genes with joined dose and time patterns of responsiveness, and predicting the molecular alteration values for the doses and time-points not included in the original experiment. Moreover, the approach assigns an activation label to each gene describing the time (early, middle, late) and dose effect (sensitive, intermediate, resilient). Genes are grouped based on the activation label, and pathway analysis can be performed on each group in order to better characterise the underlying biological mechanisms. Moreover, the method is available as a user-friendly R package on GitHub.
To the best of my knowledge, this is the first bioinformatics approach performing such an extensive analysis of both time and dose effects in toxicogenomics. This makes the method described in this manuscript of particular interest for improving our understanding of the mode of action of compounds. However, before I can recommend the manuscript for publication in GigaScience, a few issues have to be addressed: Major : * Materials and methods, section Time-dose Fitting: the authors explain that they use a desired activity threshold corresponding to a 10% increase/decrease with respect to controls. However, the authors have to explain why the threshold has been set to 10% and not to a higher or lower value.
We thank the reviewer for the useful remark. The threshold of 10% is the default threshold used in the literature for BMD analysis of transcriptomic data, as already published by Thomas  We added the following sentence in the "Time-dose effect map prediction and Dynamic dose-responsive genes identification" paragraph: FunMappOne tool.
We apologize for the missing reference and we added it. * Materials and methods, section Dataset collection: "For every drug pathological score was computed by …" -> "Every drug pathological score was computed by …" We amended the text accordingly to the reviewer suggestion * Materials and methods, section Dataset collection: "Cyclosporine A and thioacetamide were selected as representative candidates for drugs with low and high toxic impact, respectively." Add literature references that explain that Cyclosporine A is a drug with low toxic impact and thioacetamide a drug with high toxic impact.
By following the reviewer suggestions we added the following references for Cyclosporine A and Thioacetamide: We added the reference and link to the website for the Open TG-Gates dataset * Materials and methods, section Dataset collection: "12 are replicates for each dose level". Are these biological or technical replicates? Please explain already in this paragraph that you use the in vivo data.
We clarified in the manuscript that the data come from in vivo rat liver tissue and that they are biological replicates. * Caption Table 1: "Linear is the number of DDR that is fitted by a liner, poly2 and poly3 are the numbers of DDR genes that are fitted by a linear, 2nd and 3rd order polynomial function respectively." -> "Linear, Poly2 and Poly3 are the numbers of DDR genes that are fitted by a linear, 2nd and 3rd order polynomial function respectively." We amended the table caption We thank the reviewer for the useful comment and we added the missing labels to both figure 3 and 4.
* Additional files 2 and 3: I suggest to explain the abbreviations in the column names of the Excel tables in the description of the additional files.
We acknowledge that the supplementary files 2 and 3 were a little bit messy. We removed the redundant columns and we added a description of the column names in the description of the additional files as follows: "The file contains the following information: (1) dose_time_comparison: specifies if the activation is more dependent on the dose or the time; (2) Gene Description: is a text description of the gene; (3) Gene Symbol; (4) Joint Label: is the POD label; (5) gene_sign: specifies if the gene activation is increasing or decreasing with respect to the dose; (6) MeanFC: mean log-fold-change of the gene in the POD area; (7)  We added the missing labels to both figure 3 and 4. * Caption Figure 4: "direct correlation between the increase of the fold-change and the dose": authors refer to the purple and green bars. However, there are two types of green in the figures. Do you mean the blue green bars?
We thank the reviewer for pointing this out. Colors are only related to the position of the bars in the four quadrants. In this particular case, we refer only to the bars that are on the right side of the Y axis. * The terms "Dynamic Dose−Responsive Increasing Area" and "Dynamic Dose−Responsive Decreasing Area" only appear in the caption of Figure 6. It would be more clear if these terms were already explained in the overview of the methods (Figure 1c + corresponding text).
We thank the reviewer and we modified the caption of Figure 1 in order to better describe the increasing and decreasing behaviour of the dynamic dose responsive areas.
We corrected the typo ____________________________________________________________________________ Reviewer 2 The paper introduces a new toxicogenomics data modelling approach which considers simultaneously the effect of both time and dose in the calculation of points of departure. The method is applied on individual genes, but using enrichment analysis it is generalized and extended to the level of pathways and mechanisms of action.
The proposed method is novel, very interesting to the readers of GigaScience and paves the way for new developments in the area of chemicals safety and risk assessment by considering dose and time simultaneously. Therefore, I am happy to recommend the acceptance of the paper.
I have some minor comments that may contribute to further improve the manuscript and the presentation of the methodology: 1) The selected linear regression models include 1st , 2nd and 3rd order polynomials. These are not the usual models used in the standard BMD method (Gamma, Logistic models etc.). Is there any particular reason for this choice?
We thank the reviewer for raising this commnet. We would like to point out that polynomials models are used in classical BMD analysis of transcriptomics data, as already indicated in previous studies: Thomas Serra, A., Saarimäki, L. A., Fratello, M., Marwah, V. S., & Greco, D. BMDx: a graphical Shiny application to perform Benchmark Dose analysis for transcriptomics data. Bioinformatics.
The main idea of this work is to propose a methodology that jointly models time and dose to highlight the importance of the dynamics in the POD identification. Even though polynomial functions are not the only models used in BMD analysis, they can be used for reliable approximation of other functions.
Differently from the polynomials, the gamma and logistic models do not have a straightforward or unique generalization to more than one input variables. This would require a deeper evaluation of their applicability in the integrated time-dose modelling of gene expression data that might be done in further studies.
"These models are selected as representative of those used in classical BMD analysis Serra, 2000]. Indeed, unlike the gamma and logistic models, polynomials are easily generalised to more than one input variables and they can be used for reliable approximation of other functions." 2) It is mentioned in p. 3 that the desired activity threshold is a 10% increase or decrease with respect to controls. How this is compared with Fig. 1 where the threshold seems to be higher than 10%?
We first want to clarify that the maps in figure 1 are a mock-up used to explain how to read and interpret them, but they are not calculated on real data. Moreover, a difference in the expression of the treatment vs the controls of at least 10% corresponds to a fold-change of at least 1.1 that equals a log2FC of ~0.13. In figures 1C-1D, the first log2FC value above the threshold is slightly less than 0.15, which corresponds to a fold-change of 2^0.15 = ~1.11. This corresponds to an increase of ~11%, which is above the defined threshold.
3) What should be the minimum percentage of a region in the 3x3 grid belonging to the green (or red) area that defines it as a POD? For example, if a very small part of the sensitive-early region was green, it is still enough to specify it as a POD?
We thank the reviewer for raising this important point.
In the analysis shown as the case study of our manuscript, we decided to focus on PODs at the lowest dose and earliest time of exposure. This is in line with a general reasoning in toxicology, where we are interested in identifying the PODs based on the absolute sensitivity of the biological system (in time and dose). However, we acknowledge that there are some marginal cases where the POD area that we identify can have a small coverage, but we have seen that this happens rarely and the labelling strategy works pretty well. Also, It is important to note, that these labels are meant to give an indication of the POD and help in grouping the genes and give an interpretation but they are not self explanatory of the all gene expression dynamic.
Nonetheless, we implemented multiple options to assign labels to the genes in the TinderMIX library. In particular, we offer the following alternative strategies: Most left: a gene is labelled with the labe of the region containing the lowest dose and early time point of the dynamic-dose-responsive area presence: a gene is labeled with all the labels of the regions covered by the dynamicdose-responsive area. Cumulative: a gene is labeled with the labels of the regions needed to cover x% of the border of the dynamic-dose-responsive area. X is a threshold given in input by the user Mix: for each gene, a combined score is created for each region of the 3x3 grid. The score takes into account the proximity with the (0,0) coordinate, the coverage of the area and the number of points of the border of the dynamic-dose-responsive area contained in the region. Finally the gene is labeled according to the region with the higher score.
We have added a new Figure 2 in the manuscript, describing the different labeling Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation strategies and a "POD label assignment" subsection in the Materials and Method section that contains the following text: "The dose-time space is partitioned into 3X3 regions, where each dose can be labelled as sensitive, intermediate or resilient, and each time point is labelled as early, middle or late. Different strategies to identify the POD of the dynamic-dose-responsive genes are implemented in TinderMix. The first method is called "most left" strategy. It looks for the first available POD region by identifying the most sensitive point of activation (lowest dose) and then its corresponding earliest time point. The second strategy, called "presence", identifies as PODs all the regions containing the DDRA border marked as a yellow line (Figure \ref{fig:method}). The third method, called "cumulative", ranks the regions in the 3X3 scheme based on the percentage of points of the border of the DDRA that they contain. Then the regions needed to cover the X\% of the border are identified and marked as POD. X is a threshold identified by the user. In case of X = 100, the cumulative method gives the same result of the presence method. The fourth method, called "mix", creates a score for each region that takes into account how close the region is to the lowest dose and earliest time point, how much area of the region is covered by the DDRA and how many points of the DDRA border are contained in it. The region with the higher score is selected as POD. When the most sensitive and most early region is also the one with higher coverage, this method and the most left one give the same results. In all the approaches the final POD label is obtained by concatenating the dose and time of the single labels. For example, a gene with label sensitive-early is a gene that shows a response already at low doses and early time points. On the other hand, a resilient-late gene shows a response only at high doses and late time points. It is important to highlight that the labeling strategies are meant to give an indication of the POD, but they are not self explanatory of the whole gene expression dynamic. The presence and cumulative approaches identify multiple regions as POD giving a better approximation of the shape of the DDRA area. These two approaches are suggested when the focus of the study is on a few genes. On the other hand, the most left and mix approaches identify only the starting point of the gene activation and can be easily used to group genes and give biological interpretation of chain of events. In the analyses conducted in this work we have used the most left approach because it follows the toxicological assumption that a toxicant is considered active at the lowest dose and earliest time point at which its effect is significantly deviating from the control status" Different methodologies implemented to identify the dynamic-dose-response POD on a sample gene map. With the "most left" method (A) the POD area is marked as Sensitive-Early, since the activation DDRA is already present at the lowest dose and shortest time-point. With the "cumulative" method (B), the POD is marked as Sensitive-Early, Sensitive-Middle, and Sensitive-Late. The label is obtained by using 80% of the cumulative length of the DDRA border (yellow line). With the "presence" method (C) the POD is labeled as Sensitive-Early, Sensitive-Middle, Sensitive-Late, Intermediate-Early, and Resilient-Early. With the "mix" method (D) the POD is labeled as Sensitive-Middle since the DDRA starts already at lower doses and shorter time points and the area is also the one with a higher coverage by the DDRA. 4) On the POD selection again, could there be a nonlinear case where the earlysensitive region is completely blue, but the early-intermediate and middle-sensitive are green? In this case, which region is selected as the POD?
The way in which we implemented the most left labelling strategy is that we identify the smallest possible dose that is contained in the dynamic-dose-responsive area. Then, for that dose we identify the earliest time point. So in the case highlighted by the reviewer, the label would be assigned as sensitive-middle. That was a missing reference to funmappone that we now added to the manuscript. p. 7 Caption of Table 1 is not syntactically correct.
We rephrased the caption of table 1 as follows: "The number of dynamic dose-responsive genes. The DDR is the number of dynamic dose-responsive genes. Linear, Poly2 and Poly3 are the numbers of DDR genes that are fitted by a linear, 2nd and 3rd order polynomial function respectively." p.7 Caption of Figure 3 , it refers several times to figure numbers starting with 2. I suggest to include the sub-figure numbers (3A, 3B,….) next to each sub-figure.
We thank the reviewer, we corrected the figure numbers and we added the sub-figure letters p.8 Caption of Figure 4. Similar to previous comment on Figure 3.
We thank the reviewer, we corrected the figure numbers and we added the sub-figure letters

Introduction
Toxicogenomic methods are widely used for the assessment of chemical hazards and environmental health [1]. Omics technologies have been broadly accepted and recognized as ecient and reproducible tools to study the e ects of chemical exposures on di erent organisms [2]. In particular, transcrip-tomics technologies allow the investigation of gene expression patterns of thousands of genes after exposure to a substance [3] and they have been widely used in toxicogenomics [4,5,6,7].
Direct e ects of chemical insults are generally expected to follow a monotonic dose-response alteration resulting in a more impactful e ect as the dose increases until a plateau is reached [8]. In classical toxicology, this is observed as an in-Compiled on: April 21, 2020. Draft manuscript prepared by the author.

1
Click here to access/download;LaTeX -Main Document (TeX file);main.tex 2 | GigaScience, 2020, Vol. 00, No. 0 creasing number of deceased animals or any other measurable apical endpoint. A similar e ect can be appreciated in gene expression alteration. Since the changes in gene expression occur much earlier than apical endpoints, transcriptomic data can be used as a complementary approach to estimate relevant points of departure (POD) and hence provide valuable information for the toxicological assessment [9,10].
Until now, the no-observed-adverse-e ect-level (NOAEL) [11] and the benchmark dose (BMD) [12] methods have been used to help the interpretation of transcriptomic dose-response data and to derive transcriptomic PODs. In particular, by modelling the patterns of dose-responsiveness of gene expression, the BMD method provides an estimate of the lowest dose of a chemical able to induce a signi cant change in biological activity.
However, assaying the mechanism of action of exposure at multiple time points is valuable to highlight useful information on the kinetic molecular responses of a biological system. For example, the experimental setup for the in vivo studies in the Open TG-GATEs involves animal treatment in which each drug is tested at three doses (low, medium and high) at multiple time-points [7]. Thus, one major challenge is the interpretation of these toxicogenomics data, especially for identifying patterns of alterations of the gene expression that are correlated both to the dose levels and the time points. Di erent methods have been proposed to identify genes that show a dose-response e ect [13,14] or to study the gene expression dynamics at individual time points [15,16]. However, this kind of approach does not allow an easy interpretation of the time-dependent dynamics happening after the exposure and the molecular adaptation process. To date, few e orts have been done to propose a methodology able to identify sets of genes that show similar expression patterns with respect to both dose and time [17].
In this work, we propose a new computational framework, TinderMIX, for dose-and time-dependent gene expression analysis that aims to combine dimensionality reduction, BMD analysis, and polynomial tting to nd groups of genes that show a dynamic dose-response (DDR) behaviour. The integrated modelling used in TinderMIX allows us to interpolate the continuous joint dose-time space and predict the molecular alteration values for the doses and time-points not included in the original experiment. Moreover, our approach is importantly able to inform on POD in both the dimensions of the doses and time points, hence resolving at once two analytical tasks that, thus far, have only been carried out subsequently to each other.
To illustrate our methodology we analyzed the gene expression data for cyclosporine A and thioacetamide from the Open TG-GATEs database.

Materials and Methods
The TinderMIX methodology proposed in this study starts from the sample-wise log-fold-change of the genes and is able to identify which of them show a DDR behaviour and to estimate their joint dose-time POD. The methodology is composed of multiple steps that can be grouped into two parts: the genes modelling with POD identi cation ( Figure 1) that is executed for every gene in the dataset, and the POD interpretation of the dynamic dose-responsive genes (DDRG) identi ed in the rst part ( Figure 1).
As for the classical BMD analysis, the rst step of the gene modeling analysis ( Figure 1A) consists of tting di erent polynomial functions to each gene log-fold-change and identifying the optimal one by using a Nested Model Hypothesis test [18]. The di erence with the classical BMD analysis performed so far is that the tted models are three-dimensional (3D) functions in the space of the dose, log-fold-change, and time. Indeed, so far, the BMD analysis has mainly been performed in the dose and log-fold-change space for each time point individually. Afterward, the optimal 3D model is mapped in the dose-time dimensional space by means of contour plots (Figure 1B). A monotonically increasing or decreasing (with respect to the doses) dynamic dose-responsive area (DDRA) is identied starting from a user-speci ed activation threshold. (Figure 1C). If this DDRA is present, the gene is considered DDR and his POD is identi ed based on the rst dose level and activation time ( Figure 1D). Moreover, the e ect of dose and time in the DDRA is compared in order to understand if the modications in the log-fold-changes values are due to a positive or negative e ect of the time and dose and also to understand which of the two variables contribute most( Figure 1E). Indeed, if the time has a stronger e ect compared to the dose we can hypothesize that the e ect under analysis can be an adaptation process, on the other hand, if the dose contributes the most we can hypothesize that we are observing an e ect directly correlated to the exposure. In the second part of the analysis, the DDR genes identi ed can be grouped according to their POD labels ( Figure 1F). Eventually, overrepresented pathways can be calculated for each gene category ( Figure 1G).

Time-dose Fitting
For every gene, linear regression models including 1st, 2nd and 3rd order polynomials (equations 1-3) of the explanatory Figure 1. TinderMIX methodology. The TinderMIX methodology is composed of several steps that are grouped into gene modeling and POD interpretation. First, starting from the pairwise log-fold-change of each gene a 3D polynomial model is tted (A). Second, the 3D model is mapped in its 2D e ect map by means of contour plots. The lines in the contour plots describe the shape of the 3D model by showing the portion of the space where the tted model has a constant log-fold-change, indicated by the value on the lines. A positive value means that the gene is up-regulated otherwise it is down-regulated (B). Third, the dynamicdose-responsive area (DDRA) is identi ed. Starting from a user-de ned activity threshold, the time-dose e ect map is divided into regions of di erent colors. A blue region is a part of the map where the log-fold-change is lower than the threshold, thus marked as a non-active area. A green or red area indicates an activation, with a log-fold-change greater than the threshold. In particular, a green area (called Dynamic Dose-Responsive Increasing Area) is characterized by an increase of the log-fold-change when the dose increases, while a red area (called Dynamic Dose-Responsive Decreasing Area) presents a decrease of log-the fold-change when the dose increases. In the example shown in (D) the highlighted area is colored in green indicating an increasing of the log-fold-change with respect of the dose. A gene is considered DDR, if there is a DDRA in the activity map, with a monotonic behavior of the log-fold-change from a dose to the highest one tested. In the map, the front of the dose-responsive area is marked with a yellow line. Moreover, a red line is used to mark the IC50 front, which is the line connecting the set of doses (one for each time point) that speci es the 50% of gene activity (C). Fourth, based on the rst dose and time of activation the gene is labelled with an activation label that speci es its dose and time POD. The time-dose e ect map is divided into a 3 by 3 grid. The sections of the dose axis were named "S" (sensitive), "I" (intermediate), and "R" (resilient), while for the time axis the labels "E" (early), "M" (middle), and "L" (late), were assigned. The nal label was then obtained by identifying the earliest and most sensitive point of activation and concatenating the dose and time of the single labels (D). Fifth, the e ect of the dose and time in the DDRA is compared (E). The second part of the analysis consists of the POD interpretation. The letters d and t stand for dose and time, respectively. In the label, a capital letter means that there is a stronger e ect of a variable with respect to the other. If the two letters are both capitals it means that the e ect is of similar intensity. Concordance in the increase of the fold change and dose/time is indicated with a +, while decrease with a -. Thus, the DDRGs identi ed are grouped by their POD labels (F) and eventually a pathway enrichment analysis for the genes in each category is performed (G).
variables (log-fold-change) are tted. These models are selected as representative of those used in classical BMD analysis [8,13,12]. Indeed, unlike the gamma and logistic models, polynomials are easily generalised to more than one input variables and they can be used for reliable approximation of other functions.
The tting was performed by using the R lm function from the stats package [19]. The best-tting model is selected performing a Nested Model Hypothesis test [18]. The test is performed using the R function ANOVA, which takes a list of models as an input. Each model in the list is considered as the "full" model with respect to the previous "restricted" model in the list, where a number of partial regression coe cients are set to zero. In other words, a model is restricted, with respect to the full model, in the sense that it does not consider one or more explanatory variables. For each pair of subsequent models in the list, we test the signi cance of the amount of variance explained by a subset of predictors of the restricted model, compared to the variance explained by all the variables of the full model. We also add to the test, at the head of the list of models, an implicit constant model in which all the partial regression coe cients are zero, except for the intercept β 0 . This is done to assess the quality of the t in the rst place. If the p-values corresponding to testing the models in Eq. 1-3 against the constant model are not signi cant, that gene is not considered as having a dose-response e ect. If, on the other hand, there is a signi cant amount of variance explained by any of the models in Eq. 1-3, then we return the tted model corresponding to the lowest p-value.

Time-dose e ect map prediction and Dynamic doseresponsive genes identi cation
For each gene, the selected model is used to predict an activity map with the values of a smooth log-fold-change function on a grid of 50 x 50 points covering the entire range of doses and time-points tested. This map is represented as a contour plot and used to identify the dose-response area. A desired activity threshold corresponding to a 10% increase/decrease with respect to controls is set to identify the responsive area of each gene time-dose e ect map. If a gene does not show an activity satisfying the threshold it is removed from the analysis. A gene is also removed from the analysis if the responsive area does not include the highest dose of the experimental setting. The selected threshold of 10% is a default threshold used in BMD analysis of transcriptomics data [8,13,12]. For the genes passing the previous ltering, the gradient of the smooth log-foldchange function is computed at each point of the 50 x 50 grid. Each point of the grid is segmented according to if its connected components show a monotonically (with respect to the whole dose range) increment or decrement of the log-fold-change values. If only one component, either with increasing or decreasing behaviour, is present in the map and it contains the highest dose, then the component is selected as the candidate DDRA. However, if the highest dose is not contained in the area, then the gene is considered not responding and removed from the analysis. On the other hand, if multiple components with di erent increasing or decreasing behaviour, are present in the map, one of the two is selected as candidate dose-responsive area, according to the criteria described in additional le 1. In particular, each candidate region is weighted according to the following criteria: Where n p is the number of points included in the region, n tp−md is the number of time-points (rows in the gene map) that are included in the candidate region that include the highest dose and m d is the minimum dose covered by the candidate region and n is the number of candidate regions found in the gene maps. The optimal region is selected as the one maximizing the score. The active region is further reduced, by removing time-points that are still active (log-fold-change > 1.1) outside the optimal region but with a non-monotonic behaviour compared to the one inside the optimal region.
We identify the external borders of the segmented responsive area using a trace-boundary algorithm. The doseresponsive front is identi ed as the smallest dose present in the responsive area for each timepoint. Moreover, the IC50 front is also computed as the doses that give 50% of changes in the log-fold-change at every time-point.

POD Label assignment
The dose-time space is partitioned into 3 × 3 regions, where each dose can be labelled as sensitive, intermediate or resilient, and each time point is labelled as early, middle or late. Different strategies to identify the POD of the dynamic-doseresponsive genes are implemented in TinderMix ( Figure 2). The rst method is called "most left" strategy ( Figure 2A). It looks for the rst available POD region by identifying the most sensitive point of activation (lowest dose) and then its corresponding earliest time point. The second strategy, called "presence", identi es as PODs all the regions containing the DDRA border marked as a yellow line ( Figure 2B). The third method, called "cumulative", ranks the regions in the 3 × 3 scheme based on the percentage of points of the border of the DDRA that they contain. Then the regions needed to cover the X% of the border are identi ed and marked as POD. X is a threshold speci ed by the user ( Figure 2C). In case of X = 100, the "cumulative" method gives the same result of the "presence method". The fourth method, called "mix", creates a score for each region that takes into account how close the region is to the lowest dose and earliest time point, how much area of the region is covered by the DDRA and how many points of the DDRA border are contained in it. The region with the higher score is selected as POD ( Figure 2D). When the most sensitive and most early region is also the one with higher coverage, the "mix" method and the "most left" one give the same results. In all the approaches the nal POD label is obtained by concatenating the dose and time of the single labels. For example, a gene with label sensitive-early is a gene that shows a response already at low doses and early time points. On the other hand, a resilient-late gene shows a response only at high doses and late time points. It is important to highlight that the labeling strategies are meant to give an indication of the POD, but they are not self explanatory of the whole gene expression dynamic. The "presence" and "cumulative" approaches identify multiple regions as POD giving a better approximation of the shape 4 | GigaScience, 2020, Vol. 00, No. 0 area is marked as Sensitive-Early, since the activation DDRA is already present at the lowest dose and shortest time-point. With the "cumulative" method (B), the POD is marked as Sensitive-Early, Sensitive-Middle, and Sensitive-Late. The label is obtained by using 80% of the cumulative length of the DDRA border (yellow line). With the "presence" method (C) the POD is labeled as Sensitive-Early, Sensitive-Middle, Sensitive-Late, Intermediate-Early, and Resilient-Early. With the "mix" method (D) the POD is labeled as Sensitive-Middle since the DDRA starts already at lower doses and shorter time points and the area is also the one with a higher coverage by the DDRA. of the DDRA area. These two approaches are suggested when the focus of the study is on a few genes. On the other hand, the "most left" and "mix" approaches identify only the starting point of the gene activation and can be easily used to group genes and give biological interpretation of chain of events. In the analyses conducted in this work we have used the most left approach because it follows the toxicological assumption that a toxicant is considered active at the lowest dose and earliest time point at which its e ect is signi cantly deviating from the control status.

Time and dose in uence by means of gradient
Once the dynamic dose-responsive area is identi ed, the in uence of the time on the fold-change is studied by analyzing the vector eld of gradients in the region. For each point, the magnitude and the angle of the gradient are computed and used to evaluate the time-dose response score as follows: Where g i and mod i are the angle and magnitude of the gradient in the i − th point in the dynamic dose-responsive region and n is the number of points in the same region. This score can be used to categorize the genes based on the dose and time e ect on their fold change. According to the timedose response score, the active genes are divided into four groups, corresponding to the four quadrants of the cartesian space. In the rst quadrant the fold change increases with both time and dose (0 < td s ≤ 90); in the second quadrant, the fold change increases with time, but decreases with dose (90 < td s ≤ 180); in the third quadrant, the fold change decreases with both time and dose (180 < td s ≤ 270); in the fourth quadrant the fold change increases with dose and decreases with time (270 < td s ≤ 360). The four quadrants can be further dissected in three smaller sectors: one in which the dose has a stronger e ect than the time, one in which they have the same e ect and one in which the time has a stronger e ect than the dose. We assign a label to each gene composed by the letters d and t standing for dose and time respectively and a positive or negative sign. If the in uence of one of the two components is stronger than the other, this is highlighted by using capital letters. For example, the label d + T+ stands for fold change increasing both with dose and time with a stronger e ect from the time.

Enrichment of biological pathways
KEGG enrichment analysis is performed by using the methodology implemented in the FunMappOne tool [20]. Pathways were considered signi cantly enriched if they have a corrected p-value lower than 0.05.

Dataset collection
Pathological events registered in rats after drug exposure were downloaded from the Open TG-GATEs database [7] (https://dbarchive.biosciencedbc.jp/en/open-tggates/ download.html). Every drug pathological score was computed by counting the number of events occurring after the exposure normalized by the number of events in the controls. Cyclosporine A and thioacetamide were selected as representative candidates for drugs with low and high toxic impact, respectively [21,22,23,24].
Raw data microarray transcriptomic data for Cyclosporine A and thioacetamide exposure, in in vivo rat liver tissue, were downloaded from the Open TG-GATEs dataset [7] (https:// dbarchive.biosciencedbc.jp/en/open-tggates/download.html). Each dataset consists of 48 samples, of which 12 are unexposed controls and 12 are biological replicates for each dose level. Cyclosporine A was tested with doses 30, 100, and 300 mg/kg, while Thioacetamide with doses 4,15 and 45 mg/kg. In both cases, samples were harvested at four di erent time points (3, 6, 9 and 24 hours). Transcriptomics data were preprocessed using the pipeline implemented in the eUTOPIA tool [25]. The raw data were imported into R v. 3.4 by using the justRMA function from the Bioconductor utilities [26] to annotate probes to Ensembl genes (by using the rat2302rnensgcdf (v. 22.0.0) annotation le from the brainarray website (http://brainarray.mbni.med.umich.edu/), and quantile normalize the resulting expression matrix. Next, the experimental batch e ects due to technical variables were estimated and removed using the ComBat algorithm implemented in the Sva package [26,27]. For each pair of dose and time-point, all the pairwise log-fold-changes for each gene were computed as the di erence between the log2 expression values of each pair of treated and control samples. In this way, we obtained 108 pair-samples and 11721 genes to be used in the TinderMIX analysis.

Results and Discussion
We developed a novel dose and time integrative modelling strategy for transcriptomics data able to identify molecular features with a dynamic dose-responsive alteration pattern ( Figure 1). We showcase our methodology by analyzing in vivo gene expression data for two drugs (cyclosporine A and thioacetamide) available in the Open TG-GATEs dataset.

Dynamic dose-dependent mechanism of action (MOA)
The TinderMIX methodology allows to study the distribution of the gene log-fold-change with respect to both dose and time. For this purpose, TinderMIX implements a strategy similar to the classical BMD analysis [12] but translated into a threedimensional space.
For every gene, a linear, second, and third-order polynomial models are tted ( Figure 1A). The optimal model is selected as the one with adjusted goodness-of-t p-value lower than 0.01 and best modelling performance according to the nested model hypothesis test, as performed by ANOVA. Furthermore, for the genes that pass the goodness-of-t ltering, their dynamic dose-responsiveness is investigated. Hence, TinderMIX maps the 3D optimal model in its corresponding time-dose e ect map by contour plots ( Figure 1B). This step allows identifying the DDRA, by iterating the concept of doseresponsiveness on each time point ( Figure 1C). We consider a gene to be dose-dependently altered if its response is monotonic throughout all the doses at any time point. Given the complex kinetics of gene expression, interpolating the behaviour of the genes between the tested doses increases the robustness of dose-response modelling when multiple time points are assayed.
Starting from a standard gene activation threshold of 10%, TinderMIX identi ed 5,746 and 8,436 dose-responsive genes in cyclosporine A and thioacetamide, respectively (Table 1). In the case of cyclosporine A, most of the genes were tted by the second-order polynomial model, while for thioacetamide by the third-order polynomial model (Table 1). This suggests that there is a predominant non-linear relationship between the log-fold changes, the dose levels, and the time-points. The complete lists of dynamic dose-responsive genes for cyclosporine A and thioacetamide are available in additional les 2 and 3, respectively.
Furthermore, the sensitivity of the TinderMIX method to the activation threshold, was investigated. The analyses were run for di erent activation thresholds (10%, 20%, 30%, 40% and 50%).In both the two drugs present in our case study, the number of dynamic-dose-responsive genes decreases with the increasing of the thresholds (Additional le 4).

TinderMIX labelling for Point of departure
By identifying the rst dose and time point present in the DDRA ( Figure 1C), TinderMIX assigns to each DDRG an activity label ( Figure 1D). The label provides information on the joint timedose POD at a glance. For instance, the labels aid drawing a hypothetical sequence of events. The fact that a gene can re-spond at di erent dose ranges informs on the sensitivity of certain molecules and regulatory machinery to a speci c exposure. The analysis of the dynamic dose-dependent gene proles might give insights about the harmfulness of a compound. Indeed, substances that activate many genes at low doses are probably more toxic than those exerting resilient responses. This is the case of the two drugs we analyzed, as cyclosporine A shows most of the activation at low and high doses and early and middle time points, while thioacetamide shows most of the activation at low doses at all the time points (Figure 3).

E ect of time and dose on the dynamic MOA
Even though the labels that TinderMIX assigns inform on the gene POD, they do not give insights on the relative impact of the dose and time on the variation of the log-fold-change. In order to dissect these e ects and the relative contribution of dose and time to the gene alteration, TinderMIX weighs their e ect in the time-dose e ect maps ( Figure 1B). Indeed, by studying the direction of the gradient in each pixel of the DDRA ( Figure 1C), we are able to assess the contribution of time and dose to the dynamic dose-response behaviour of the gene expression, and whether its e ect is positive or negative ( Figure 1E). Moreover, TinderMIX generates a radar plot that summarizes the relative e ect of dose and time onto the DDRG as well as their direction (Figure 4 and 5). There are di erent scenarios where it is useful to recognize if the genes are more prominently a ected by the dose or the time. For example, genes for which the exposure has a predominant e ect might be more sensitive to the dose. On the other hand, some genes might be under complex regulatory mechanisms, some of which could be secondary to the exposure itself. Thus, their expression is not only altered in a dose-dependent manner, but also kinetically modulated along the time. Indeed, in both cyclosporine A and thioacetamide, late genes undergo a stronger e ect of time than to the dose ( Figures 4A, 4B, 4C, 5A, 5B, 5C), as expected since transcriptomic alterations happening 24h after the exposure are more likely to be due to regulatory processes happening downstream the primary drug induced response. On the other hand, early and middle genes appear to be more a ected by the dose, especially in the case of thioacetamide, a compound with a more pronounced known toxic potential ( Figures 5D, 5E, 5G, 5H). Therefore, our tool provides better insights about the exposure MOA on a biological system, by providing both quantitative and qualitative estimation of the perturbation with respect to time and dose.   (Figures 5E,5F), also the same e ect can be observed in the early, sensitive-middle and late genes (violet and green bars in gures 5A, 5B, 5C, 5D, 5G and green bars in 5H and 5I) even though some of them are negatively a ected by the doses (red bars in gures 5A, 5B, 5C and 5D, 5G, 5H and 5I). 6 | GigaScience, 2020, Vol. 00, No. 0

Pathway enrichment analysis
In order to characterize the biological processes underlying the POD labels assigned to the DDRG, we performed KEGG enrichment analysis of the genes belonging to the nine label categories previously identi ed (additional les 5 and 6). We further grouped the enriched pathways based on the time of activation to draw a kinetic map of the events in the exposure time ( Figure 6). Cyclosporine A is a known immunosuppressant drug, with a low toxic potential. Indeed, at early time points, 108 deregulated pathways were found ( Figure 6A) while, at middle and late time points, only few deregulated pathways (24 and 4 respectively) were obtained. Cyclosporine A is known to inhibit the activation of T lymphocytes by blocking the activity of calcineurin phosphatase [28]. In fact, several interleukins and chemokine-driven immune system mechanisms were found signi cantly deregulated at early time points. In particular, Th1 and Th2 cell di erentiation pathways were found altered. Among the main e ectors of such pathways Cd4, one of the main markers of T lymphocytes, was labelled by TinderMIX as a sensitive-early DDR gene. On the other hand, thioacetamide exposure has been associated with liver toxicity and in ammatory cells in ltration [29]. As it might be expected for a more toxic drug, more deregulated pathways at any time points were retrieved in our analysis ( Figure 6B). Among the ones enriched at early time points, infectious disease pathways such as Hepatitis B and C, as well as apoptosis pathway, were enriched and both apoptotic and necrosis related genes were up/down-regulated, such as Fadd, Fas, Bad and Bid. Consistently with an hepatotoxic induced e ect, the Aldh2 gene, which is known to be altered in patients with chronic hepatitis and non-alcoholic cirrhosis, was found also deregulated in intermediate pathways [30].

Visual inspection of the gene maps
To complement the information given by the POD labels, the time-dose e ect maps of the DDRG allow to visualize the whole kinetics of the log-fold-changes in the joint dose-time space. Furthermore the levels of the contour plots specify if the gene is up (+) or down (-) regulated. The e ect maps of the previously identi ed genes are described as an example. As we can see from gure ( Figure 7A

Conclusion
In this study, we proposed TinderMIX, a novel methodology for the joint time and dose modelling of toxicogenomics data. TinderMIX consists of di erent steps that combine polynomial model tting, active region extraction and pathway enrichment analysis in order to identify genes with joined dose and time patterns of responsiveness. TinderMIX allows the user to study the dynamic behaviour of the genes in the joint dosetime space, by interpolating the omics feature levels and lling the gaps between the doses and time points tested in the experiment. TinderMIX automatically assigns to the responsive genes an activation label that speci es the joint dose and time point of departure (POD) of the gene and estimates the strength of the e ect of time and dose on each gene activation. Moreover, it allows to graphically inspect the gene maps as contour plots. Each gene map can give insights into the dynamic and dose-dependent shape of the gene log-fold-change. It easily shows the point of departure of the gene and it highlights the monotonic trend of the responsive area. The TinderMIX methodology also helps in grouping the genes based on the activation label and to identify the set of pathways associated with each group in order to better characterise the underlying biological mechanisms. In conclusion, TinderMIX can be used to investigate the point of departure of genes with respect to dose and time point upon chemical exposure with an integrated analytical approach.

Availability of source code and requirements
The TinderMIX method is available in the form of an R package, which is available via a git repository: • Project name: TinderMIX • Project home page: https://github.com/grecolab/TinderMIX • Operating system(s): Platform independent • Programming language: R • Other requirements: Java • License: GNU GPL (>= 3)

Availability of supporting data and materials
The data used to showcase the TinderMIX methodology are available in the git repository at https://github.com/grecolab/ TinderMIX/tree/master/sample_data.   have been used to help the interpretation of transcriptomic dose-response data and to derive transcriptomic PODs. In particular, by modelling the patterns of dose-responsiveness of gene expression, the BMD method provides an estimate of the lowest dose of a chemical able to induce a signi cant change in biological activity.

Additional Files
However, assaying the mechanism of action of exposure at multiple time points is valuable to highlight useful information on the kinetic molecular responses of a biological system. For example, the experimental setup for the in vivo studies in the Open TG-GATEs involves animal treatment in which each drug is tested at three doses (low, medium and high) at multiple time-points [? ]. Thus, one major challenge is the interpretation of these toxicogenomics data, especially for identifying patterns of alterations of the gene expression that are correlated both to the dose levels and the time points. Di erent methods have been proposed to identify genes that show a dose-response e ect [? ? ] or to study the gene expression dynamics at individual time points [? ? ]. However, this kind of approach does not allow an easy interpretation of the time-dependent dynamics happening after the exposure and the molecular adaptation process. To date, few e orts have been done to propose a methodology able to identify sets of genes that show similar expression patterns with respect to both dose and time [? ].
In this work, we propose a new computational framework, TinderMIX, for dose-and time-dependent gene expression analysis that aims to combine dimensionality reduction, BMD analysis, and polynomial tting to nd groups of genes that show a dynamic dose-response (DDR) behaviour. The integrated modelling used in TinderMIX allows us to interpolate the continuous joint dose-time space and predict the molecular alteration values for the doses and time-points not included in the original experiment. Moreover, our approach is importantly able to inform on POD in both the dimensions of the doses and time points, hence resolving at once two analytical tasks that, thus far, have only been carried out subsequently to each other.
To illustrate our methodology we analyzed the gene expression data for cyclosporine A and thioacetamide from the Open TG-GATEs database.

Materials and Methods
The TinderMIX methodology proposed in this study starts from the sample-wise log-fold-change of the genes and is able to identify which of them show a DDR behaviour and to estimate their joint dose-time POD. The methodology is composed of multiple steps that can be grouped into two parts: the genes modelling with POD identi cation (Figure ??) that is executed for every gene in the dataset, and the POD interpretation of the dynamic dose-responsive genes (DDRG) identi ed in the rst part (Figure ??).
As for the classical BMD analysis, the rst step of the gene modeling analysis (Figure ??A) consists of tting di erent polynomial functions to each gene log-fold-change and identifying the optimal one by using a Nested Model Hypothesis test [? ]. The di erence with the classical BMD analysis performed so far is that the tted models are three-dimensional (3D) functions in the space of the dose, log-fold-change, and time. Indeed, so far, the BMD analysis has mainly been performed in the dose and log-fold-change space for each time point individually. Afterward, the optimal 3D model is mapped in the dose-time dimensional space by means of contour plots (Figure ??B). A monotonically increasing or decreasing (with respect to the doses) dynamic dose-responsive area (DDRA) is identi ed starting from a user-speci ed activation threshold. (Figure ??C). If this DDRA is present, the gene is considered DDR and his POD is identi ed based on the rst dose level and activation time (Figure ??D). Moreover, the e ect of dose and time in the DDRA is compared in order to understand if the modi cations in the log-fold-changes values are due to a positive or negative e ect of the time and dose and also to understand which of the two variables contribute most (Figure ??E). Indeed, if the time has a stronger e ect compared to the dose we can hypothesize that the e ect under analysis can be an adaptation process, on the other hand, if the dose contributes the most we can hypothesize that we are observing an e ect directly correlated to the exposure. In the second part of the analysis, the DDR genes identi ed can be grouped according to their POD labels (Figure ??F). Eventually, overrepresented pathways can be calculated for each gene category (Figure ??G).

Time-dose Fitting
For every gene, linear regression models including 1st, 2nd and 3rd order polynomials (equations 1-3) of the explanatory vari- Figure 1. TinderMIX methodology. The TinderMIX methodology is composed of several steps that are grouped into gene modeling and POD interpretation. First, starting from the pairwise log-fold-change of each gene a 3D polynomial model is tted (A). Second, the 3D model is mapped in its 2D e ect map by means of contour plots. The lines in the contour plots describe the shape of the 3D model by showing the portion of the space where the tted model has a constant log-fold-change, indicated by the value on the lines. A positive value means that the gene is up-regulated otherwise it is down-regulated (B). Third, the dynamicdose-responsive area (DDRA) is identi ed. Starting from a user-de ned activity threshold, the time-dose e ect map is divided into regions of di erent colors. A blue region is a part of the map where the log-fold-change is lower than the threshold, thus marked as a non-active area. A green or red area indicates an activation, with a log-fold-change greater than the threshold. In particular, a green area (called Dynamic Dose-Responsive Increasing Area) is characterized by an increase of the log-fold-change when the dose increases, while a red area (called Dynamic Dose-Responsive Decreasing Area) presents a decrease of log-the fold-change when the dose increases. In the example shown in (D) the highlighted area is colored in green indicating an increasing of the log-fold-change with respect of the dose. A gene is considered DDR, if there is a DDRA in the activity map, with a monotonic behavior of the log-fold-change from a dose to the highest one tested. In the map, the front of the dose-responsive area is marked with a yellow line. Moreover, a red line is used to mark the IC50 front, which is the line connecting the set of doses (one for each time point) that speci es the 50% of gene activity (C). Fourth, based on the rst dose and time of activation the gene is labelled with an activation label that speci es its dose and time POD. The time-dose e ect map is divided into a 3 by 3 grid. The sections of the dose axis were named "S" (sensitive), "I" (intermediate), and "R" (resilient), while for the time axis the labels "E" (early), "M" (middle), and "L" (late), were assigned. The nal label was then obtained by identifying the earliest and most sensitive point of activation and concatenating the dose and time of the single labels (D). Fifth, the e ect of the dose and time in the DDRA is compared (E). The second part of the analysis consists of the POD interpretation. The letters d and t stand for dose and time, respectively. In the label, a capital letter means that there is a stronger e ect of a variable with respect to the other. If the two letters are both capitals it means that the e ect is of similar intensity. Concordance in the increase of the fold change and dose/time is indicated with a +, while decrease with a -. Thus, the DDRGs identi ed are grouped by their POD labels (F) and eventually a pathway enrichment analysis for the genes in each category is performed (G).
ables (log-fold-change) are tted. These models are selected as representative of those used in classical BMD analysis [? ? ? ]. Indeed, unlike the gamma and logistic models, polynomials are easily generalised to more than one input variables and they can be used for reliable approximation of other functions.
The tting was performed by using the R lm function from the stats package [? ]. The best-tting model is selected performing a Nested Model Hypothesis test [? ]. The test is performed using the R function ANOVA, which takes a list of models as an input. Each model in the list is considered as the "full" model with respect to the previous "restricted" model in the list, where a number of partial regression coe cients are set to zero. In other words, a model is restricted, with respect to the full model, in the sense that it does not consider one or more explanatory variables. For each pair of subsequent models in the list, we test the signi cance of the amount of variance explained by a subset of predictors of the restricted model, compared to the variance explained by all the variables of the full model. We also add to the test, at the head of the list of models, an implicit constant model in which all the partial regression coe cients are zero, except for the intercept β 0 . This is done to assess the quality of the t in the rst place. If the p-values corresponding to testing the models in Eq. 1-3 against the constant model are not signi cant, that gene is not considered as having a dose-response e ect. If, on the other hand, there is a signi cant amount of variance explained by any of the models in Eq. 1-3, then we return the tted model corresponding to the lowest p-value.

Time-dose e ect map prediction and Dynamic doseresponsive genes identi cation
For each gene, the selected model is used to predict an activity map with the values of a smooth log-fold-change function on a grid of 50 x 50 points covering the entire range of doses and time-points tested. This map is represented as a contour plot and used to identify the dose-response area. A desired activity threshold corresponding to a 10% increase/decrease with respect to controls is set to identify the responsive area of each gene time-dose e ect map. If a gene does not show an activity satisfying the threshold it is removed from the analysis. A gene is also removed from the analysis if the responsive area does not include the highest dose of the experimental setting. The selected threshold of 10% is a default threshold used in BMD analysis of transcriptomics data [? ? ? ]. For the genes passing the previous ltering, the gradient of the smooth log-foldchange function is computed at each point of the 50 x 50 grid. Each point of the grid is segmented according to if its connected components show a monotonically (with respect to the whole dose range) increment or decrement of the log-fold-change values. If only one component, either with increasing or decreasing behaviour, is present in the map and it contains the highest dose, then the component is selected as the candidate DDRA. However, if the highest dose is not contained in the area, then the gene is considered not responding and removed from the analysis. On the other hand, if multiple components with di erent increasing or decreasing behaviour, are present in the map, one of the two is selected as candidate dose-responsive area, according to the criteria described in additional le 1. In particular, each candidate region is weighted according to the following criteria: Where n p is the number of points included in the region, n tp−md is the number of time-points (rows in the gene map) that are included in the candidate region that include the highest dose and m d is the minimum dose covered by the candidate region and n is the number of candidate regions found in the gene maps. The optimal region is selected as the one maximizing the score. The active region is further reduced, by removing time-points that are still active (log-fold-change > 1.1) outside the optimal region but with a non-monotonic behaviour compared to the one inside the optimal region.
We identify the external borders of the segmented responsive area using a trace-boundary algorithm. The doseresponsive front is identi ed as the smallest dose present in the responsive area for each timepoint. Moreover, the IC50 front is also computed as the doses that give 50% of changes in the log-fold-change at every time-point.

POD Label assignment
The dose-time space is partitioned into 3 × 3 regions, where each dose can be labelled as sensitive, intermediate or resilient, and each time point is labelled as early, middle or late. Different strategies to identify the POD of the dynamic-doseresponsive genes are implemented in TinderMix (Figure ??). The rst method is called "most left" strategy (Figure ??A). It looks for the rst available POD region by identifying the most sensitive point of activation (lowest dose) and then its corresponding earliest time point. The second strategy, called "presence", identi es as PODs all the regions containing the DDRA border marked as a yellow line (Figure ??B). The third method, called "cumulative", ranks the regions in the 3 × 3 scheme based on the percentage of points of the border of the DDRA that they contain. Then the regions needed to cover the X% of the border are identi ed and marked as POD. X is a threshold speci ed by the user (Figure ??C). In case of X = 100, the "cumulative" method gives the same result of the "presence method". The fourth method, called "mix", creates a score for each region that takes into account how close the region is to the lowest dose and earliest time point, how much area of the region is covered by the DDRA and how many points of the DDRA border are contained in it. The region with the higher score is selected as POD (Figure ??D). When the most sensitive and most early region is also the one with higher coverage, the "mix" method and the "most left" one give the same results. In all the approaches the nal POD label is obtained by concatenating the dose and time of the single labels. For example, a gene with label sensitive-early is a gene that shows a response already at low doses and early time points. On the other hand, a resilient-late gene shows a response only at high doses and late time points. It is important to highlight that the labeling strategies are meant to give an indication of the POD, but they are not self explanatory of the whole gene expression dynamic. The "presence" and "cumulative" approaches identify multiple regions as POD giving a better approximation of the shape of the DDRA area. These two approaches are suggested when 4 | GigaScience, 2020, Vol. 00, No. 0 Figure 2. Di erent methodologies implemented to identify the dynamic-doseresponse POD on a sample gene map. With the "most left" method (A) the POD area is marked as Sensitive-Early, since the activation DDRA is already present at the lowest dose and shortest time-point. With the "cumulative" method (B), the POD is marked as Sensitive-Early, Sensitive-Middle, and Sensitive-Late. The label is obtained by using 80% of the cumulative length of the DDRA border (yellow line). With the "presence" method (C) the POD is labeled as Sensitive-Early, Sensitive-Middle, Sensitive-Late, Intermediate-Early, and Resilient-Early. With the "mix" method (D) the POD is labeled as Sensitive-Middle since the DDRA starts already at lower doses and shorter time points and the area is also the one with a higher coverage by the DDRA. the focus of the study is on a few genes. On the other hand, the "most left" and "mix" approaches identify only the starting point of the gene activation and can be easily used to group genes and give biological interpretation of chain of events. In the analyses conducted in this work we have used the most left approach because it follows the toxicological assumption that a toxicant is considered active at the lowest dose and earliest time point at which its e ect is signi cantly deviating from the control status.

Time and dose in uence by means of gradient
Once the dynamic dose-responsive area is identi ed, the in uence of the time on the fold-change is studied by analyzing the vector eld of gradients in the region. For each point, the magnitude and the angle of the gradient are computed and used to evaluate the time-dose response score as follows: Where g i and mod i are the angle and magnitude of the gradient in the i − th point in the dynamic dose-responsive region and n is the number of points in the same region. This score can be used to categorize the genes based on the dose and time e ect on their fold change. According to the timedose response score, the active genes are divided into four groups, corresponding to the four quadrants of the cartesian space. In the rst quadrant the fold change increases with both time and dose (0 < td s ≤ 90); in the second quadrant, the fold change increases with time, but decreases with dose (90 < td s ≤ 180); in the third quadrant, the fold change decreases with both time and dose (180 < td s ≤ 270); in the fourth quadrant the fold change increases with dose and decreases with time (270 < td s ≤ 360). The four quadrants can be further dissected in three smaller sectors: one in which the dose has a stronger e ect than the time, one in which they have the same e ect and one in which the time has a stronger e ect than the dose. We assign a label to each gene composed by the letters d and t standing for dose and time respectively and a positive or negative sign. If the in uence of one of the two components is stronger than the other, this is highlighted by using capital letters. For example, the label d + T+ stands for fold change increasing both with dose and time with a stronger e ect from the time.

Enrichment of biological pathways
KEGG enrichment analysis is performed by using the methodology implemented in the FunMappOne tool [? ]. Pathways  (3, 6, 9 and 24 hours). Transcriptomics data were preprocessed using the pipeline implemented in the eUTOPIA tool [? ]. The raw data were imported into R v. 3.4 by using the justRMA function from the Bioconductor utilities [? ] to annotate probes to Ensembl genes (by using the rat2302rnensgcdf (v. 22.0.0) annotation le from the brainarray website (http://brainarray.mbni.med.umich.edu/), and quantile normalize the resulting expression matrix. Next, the experimental batch e ects due to technical variables were estimated and removed using the ComBat algorithm implemented in the Sva package [? ? ]. For each pair of dose and time-point, all the pairwise log-fold-changes for each gene were computed as the di erence between the log2 expression values of each pair of treated and control samples. In this way, we obtained 108 pair-samples and 11721 genes to be used in the TinderMIX analysis.

Results and Discussion
We developed a novel dose and time integrative modelling strategy for transcriptomics data able to identify molecular features with a dynamic dose-responsive alteration pattern (Figure ??). We showcase our methodology by analyzing in vivo gene expression data for two drugs (cyclosporine A and thioacetamide) available in the Open TG-GATEs dataset.

Dynamic dose-dependent mechanism of action (MOA)
The TinderMIX methodology allows to study the distribution of the gene log-fold-change with respect to both dose and time. For this purpose, TinderMIX implements a strategy similar to the classical BMD analysis[? ] but translated into a threedimensional space.
For every gene, a linear, second, and third-order polynomial models are tted (Figure ??A). The optimal model is selected as the one with adjusted goodness-of-t p-value lower than 0.01 and best modelling performance according to the nested model hypothesis test, as performed by ANOVA. Furthermore, for the genes that pass the goodness-of-t ltering, their dynamic dose-responsiveness is investigated. Hence, TinderMIX maps the 3D optimal model in its corresponding time-dose e ect map by contour plots (Figure ??B). This step allows identifying the DDRA, by iterating the concept of doseresponsiveness on each time point (Figure ??C). We consider a gene to be dose-dependently altered if its response is monotonic throughout all the doses at any time point. Given the complex kinetics of gene expression, interpolating the behaviour of the genes between the tested doses increases the robustness of dose-response modelling when multiple time points are assayed.
Starting from a standard gene activation threshold of 10%, TinderMIX identi ed 5,746 and 8,436 dose-responsive genes in cyclosporine A and thioacetamide, respectively ( Table ??).
In the case of cyclosporine A, most of the genes were tted by the second-order polynomial model, while for thioacetamide by the third-order polynomial model (Table ??). This suggests that there is a predominant non-linear relationship between the log-fold changes, the dose levels, and the time-points. The complete lists of dynamic dose-responsive genes for cyclosporine A and thioacetamide are available in additional les 2 and 3, respectively.
Furthermore, the sensitivity of the TinderMIX method to the activation threshold, was investigated. The analyses were run for di erent activation thresholds (10%, 20%, 30%, 40% and 50%).In both the two drugs present in our case study, the number of dynamic-dose-responsive genes decreases with the increasing of the thresholds (Additional le 4).

TinderMIX labelling for Point of departure
By identifying the rst dose and time point present in the DDRA (Figure ??C), TinderMIX assigns to each DDRG an activity label (Figure ??D). The label provides information on the joint time-dose POD at a glance. For instance, the labels aid drawing a hypothetical sequence of events. The fact that a gene can respond at di erent dose ranges informs on the sensitivity of certain molecules and regulatory machinery to a speci c exposure. The analysis of the dynamic dose-dependent gene pro les might give insights about the harmfulness of a compound. Indeed, substances that activate many genes at low doses are probably more toxic than those exerting resilient responses. This is the case of the two drugs we analyzed, as cyclosporine A shows most of the activation at low and high doses and early and middle time points, while thioacetamide shows most of the activation at low doses at all the time points (Figure ??).

E ect of time and dose on the dynamic MOA
Even though the labels that TinderMIX assigns inform on the gene POD, they do not give insights on the relative impact of the dose and time on the variation of the log-fold-change. In order to dissect these e ects and the relative contribution of dose and time to the gene alteration, TinderMIX weighs their e ect in the time-dose e ect maps ( Figure 1B). Indeed, by studying the direction of the gradient in each pixel of the DDRA (Figure 1C), we are able to assess the contribution of time and dose to the dynamic dose-response behaviour of the gene expression, and whether its e ect is positive or negative ( Figure 1E). Moreover, TinderMIX generates a radar plot that summarizes the relative e ect of dose and time onto the DDRG as well as their direction (Figure ?? and ??). There are di erent scenarios where it is useful to recognize if the genes are more prominently a ected by the dose or the time. For example, genes for which the exposure has a predominant e ect might be more sensitive to the dose. On the other hand, some genes might be under complex regulatory mechanisms, some of which could be secondary to the exposure itself. Thus, their expression is not only altered in a dose-dependent manner, but also kinetically modulated along the time. Indeed, in both cyclosporine A and thioacetamide, late genes undergo a stronger e ect of time than to the dose (Figures ??A, ??B, ??C, ??A, ??B, ??C), as expected since transcriptomic alterations happening 24h after the exposure are more likely to be due to regulatory processes happening downstream the primary drug induced response. On the other hand, early and middle genes appear to be more a ected by the dose, especially in the case of thioacetamide, a compound with a more pronounced known toxic potential (Figures ??D, ??E, ??G, ??H). Therefore, our tool provides better insights about the exposure MOA on a biological system, by providing both quantitative and qualitative estimation of the perturbation with respect to time and dose.  Figures ??H-??E), while the e ect of time is more evenly distributed across the possible combinations (Figure ??). The early and middle genes mostly show an increase of the log-fold-change with respect to the time (Figures ??E,  ??F, ??H, ??I), whereas the fold-change of late genes decreases over time. Considering the dose, the log-fold-change of sensitive-late genes mainly decreases as the dose increases. (Figures ??D, ??A) Di erently, the intermediate and resilient genes show an increasing trend with respect to the dose (Figures ??C, ??E, ??F,  ??H, ??I).

Figure 5.
Thioacetamide distribution of the number of responsive genes with respect to time and dose. The letters d and t indicate dose and time, respectively. In the label, a capital letter means that there is a stronger e ect of a variable with respect to the other. If the two letters are both capitals it means that the e ect is comparable. Concordance in the increase of the fold change and dose/time is indicated with a "+", while decrease with a "-". For thioacetamide, the e ect of the time on the dynamic dose-responsive genes is stronger than the e ect of the dose (capital T in Figures ??A, ??B, ??C, ??F, ??H, ??I). The increase of time corresponds to an increase in the gene expression in the intermediate-middle and resilient-middle genes (Figures ??E,??F). An opposite e ect is visible on the intermediate-early and resilient early genes (Figures ??H and ??I). In all the other sets of genes the positive and negative e ect of the time on the fold-change is balanced (Figures ??A, ??B, ??C, ??D, ??G). With respect to the dose, intermediate-middle and resilient middle genes mainly show a direct correlation between the increase of the fold-change and the dose (Figures ??E,??F 6 | GigaScience, 2020, Vol. 00, No. 0

Pathway enrichment analysis
In order to characterize the biological processes underlying the POD labels assigned to the DDRG, we performed KEGG enrichment analysis of the genes belonging to the nine label categories previously identi ed (additional les 5 and 6). We further grouped the enriched pathways based on the time of activation to draw a kinetic map of the events in the exposure time (Figure ??). Cyclosporine A is a known immunosuppressant drug, with a low toxic potential. Indeed, at early time points, 108 deregulated pathways were found (Figure ??A) while, at middle and late time points, only few deregulated pathways (24 and 4 respectively) were obtained. Cyclosporine A is known to inhibit the activation of T lymphocytes by blocking the activity of calcineurin phosphatase [? ]. In fact, several interleukins and chemokine-driven immune system mechanisms were found signi cantly deregulated at early time points. In particular, Th1 and Th2 cell di erentiation pathways were found altered. Among the main e ectors of such pathways Cd4, one of the main markers of T lymphocytes, was labelled by TinderMIX as a sensitive-early DDR gene. On the other hand, thioacetamide exposure has been associated with liver toxicity and in ammatory cells in ltration [? ]. As it might be expected for a more toxic drug, more deregulated pathways at any time points were retrieved in our analysis (Figure ??B). Among the ones enriched at early time points, infectious disease pathways such as Hepatitis B and C, as well as apoptosis pathway, were enriched and both apoptotic and necrosis related genes were up/down-regulated, such as Fadd, Fas, Bad and Bid. Consistently with an hepatotoxic induced e ect, the Aldh2 gene, which is known to be altered in patients with chronic hepatitis and non-alcoholic cirrhosis, was found also deregulated in intermediate pathways [? ].

Visual inspection of the gene maps
To complement the information given by the POD labels, the time-dose e ect maps of the DDRG allow to visualize the whole kinetics of the log-fold-changes in the joint dose-time space. Furthermore the levels of the contour plots specify if the gene is up (+) or down (-) regulated. The e ect maps of the previously identi ed genes are described as an example. As we can see from gure ( Figure ??A), Cd4 is marked as sensitive-early, and it is down-regulated at early time points (log-fold-change in [-0. . On the other hand, the Aldh2 gene is labelled as sensitive-middle since the DDRA begins at the lowest doses and middle time points (Figure ??B). It does not show any activity at early time points with low and middle doses, while it is dynamically dose-dependent and down-regulated already at early time points with high doses. Its strength of downregulation increases with both time and dose (log-fold-change in [-0.15,-0.45]). The overall downregulation visible in the map has already been reported in previous studies demonstrating that, in rats, thioacetamide can directly inhibit the Aldh2 isoenzyme[? ].

Conclusion
In this study, we proposed TinderMIX, a novel methodology for the joint time and dose modelling of toxicogenomics data. TinderMIX consists of di erent steps that combine polynomial model tting, active region extraction and pathway enrichment analysis in order to identify genes with joined dose and time patterns of responsiveness. TinderMIX allows the user to study the dynamic behaviour of the genes in the joint dosetime space, by interpolating the omics feature levels and lling the gaps between the doses and time points tested in the experiment. TinderMIX automatically assigns to the responsive genes an activation label that speci es the joint dose and time point of departure (POD) of the gene and estimates the strength of the e ect of time and dose on each gene activation. Moreover, it allows to graphically inspect the gene maps as contour plots. Each gene map can give insights into the dynamic and dose-dependent shape of the gene log-fold-change. It easily shows the point of departure of the gene and it highlights the monotonic trend of the responsive area. The TinderMIX methodology also helps in grouping the genes based on the activation label and to identify the set of pathways associated with each group in order to better characterise the underlying biological mechanisms. In conclusion, TinderMIX can be used to investigate the point of departure of genes with respect to dose and time point upon chemical exposure with an integrated analytical approach.

Title: Time-Dose INtegrated moDelling of toxicogenomics data
Dear editor, please find enclosed a revised version of our manuscript. We addressed the comments raised by the reviewers. We would like to thank the reviewers for constructive feedback. We believe that this revised manuscript is significantly improved from the initial submission and we look forward to your positive feedback.
Yours Sincerely, Dario Greco PhD, Associate Professor, Faculty of Medicine and Health Technology Tampere University Finland

Reviewer 1
General : In this paper, the authors propose a new approach, called TinderMIX, for simultaneously studying time and dose effects in toxicogenomics. Although many toxicogenomics experiments include data from different doses measured at different time points, only a few bioinformatics approaches to integrate time and dose effects exist, and their applications are limited to network identification and response modeling. The new method proposed by the authors extends existing methods by identifying genes with joined dose and time patterns of responsiveness, and predicting the molecular alteration values for the doses and time-points not included in the original experiment. Moreover, the approach assigns an activation label to each gene describing the time (early, middle, late) and dose effect (sensitive, intermediate, resilient). Genes are grouped based on the activation label, and pathway analysis can be performed on each group in order to better characterise the underlying biological mechanisms. Moreover, the method is available as a user-friendly R package on GitHub. To the best of my knowledge, this is the first bioinformatics approach performing such an extensive analysis of both time and dose effects in toxicogenomics. This makes the method described in this manuscript of particular interest for improving our understanding of the mode of action of compounds. However, before I can recommend the manuscript for publication in GigaScience, a few issues have to be addressed:

Major :
Click here to access/download;Personal Cover;reply_revievers_tindermix.docx