BELMM: Bayesian model selection and random walk smoothing in time-series clustering

Abstract Motivation Due to advances in measuring technology, many new phenotype, gene expression, and other omics time-course datasets are now commonly available. Cluster analysis may provide useful information about the structure of such data. Results In this work, we propose BELMM (Bayesian Estimation of Latent Mixture Models): a flexible framework for analysing, clustering, and modelling time-series data in a Bayesian setting. The framework is built on mixture modelling: first, the mean curves of the mixture components are assumed to follow random walk smoothing priors. Second, we choose the most plausible model and the number of mixture components using the Reversible-jump Markov chain Monte Carlo. Last, we assign the individual time series into clusters based on the similarity to the cluster-specific trend curves determined by the latent random walk processes. We demonstrate the use of fast and slow implementations of our approach on both simulated and real time-series data using widely available software R, Stan, and CU-MSDSp. Availability and implementation The French mortality dataset is available at http://www.mortality.org, the Drosophila melanogaster embryogenesis gene expression data at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE121160. Details on our simulated datasets are available in the Supplementary Material, and R scripts and a detailed tutorial on GitHub at https://github.com/ollisa/BELMM. The software CU-MSDSp is available on GitHub at https://github.com/jtchavisIII/CU-MSDSp.


Foreword
This supplementary is organised as follows: First, we complement the discussion regarding model selection with the Reversible jump MCMC, followed by explaining the rectified model posterior distribution.Then, we provide detailed results from the simulation studies and the two empirical analyses.All the R code for producing the figures and tables is available in an accompanied R file with code for data simulation and processing (on GitHub at https://gith ub.com/ollisa/BELMM).The actual time series data sets are not included (French mortality data due to license agreement, Drosophila melanogaster due to copyright concerns).Still, they can be downloaded free from their original repositories, the French mortality data http://www.mortality.org/ and Drosophila melanogaster https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE121160.The "results" files from CU-MSDSp are also not included since they are large and contain the aforementioned data sets.

Why is the method called "latent"?
This wording refers to a more general "latent variable models" concept.In short, "latent" comes from the model assumption that some unobserved parameters influence the observed data.

Reversible jump MCMC Additional discussion points
The RJMCMC approach has some notable advantages over the information criterion-based counterparts.First, we have a high level of flexibility regarding model structures we can compare.The models need be nested, especially when we only want to determine the number K, but unnested models can also be compared.With RJMCMC, it is possible to compare models where each of the mixture components have their own model structures.Between the models, each of the mixture components can also have their own set of priors, say a mixture of Gaussian and Laplace distributed components or a mixture of components with different levels of smoothing in each, if we want some of the components to have more variation than others.Second, we obtain the model posterior distribution, not just a list of values for the different information criteria.This allows us to make sensible comparisons between apples and oranges -situations.It can also be beneficial when there is no clear-cut solution to the application at hand.However, as a drawback, there is a higher computational burden with the RJMCMC approach.
Last, we have left open the interpretation of the model posterior distribution.In the traditional RJMCMC, if the only difference between the models were the number of components K, the model posterior distribution would be the posterior distribution of the number of components, P (K|Data).However, CU-MSDSp neither utilises the Birth-and-Death nor the Birth-Combine-Death process, on which the traditional methods are based (see, e.g.[Cappé et al., 2003]).This means that the implementation of RJMCMC does not penalise the models with false or loosely supported components, which can skew the model posterior distribution.
On the model level, the obvious first aid would be setting the prior range for the number of mixture components such that the weights of the false components are restricted, a priori, to zero; Any models with false components could then be left out of the RJMCMC altogether.This would require more work before the actual model selection, somewhat defeating the purpose of using RJMCMC.Also, another hit to the objectivity of the model selection arises from the 'curse of dimensionality'.Theoretically, it makes sense to avoid the false components.However, even with a restrictive prior structure, the estimated weights will be small but never truly zero.Hence, if the number of observations is sufficiently large, some will be 'erroneously' assigned to the loosely supported clusters.Then one would need to remove these models with seemingly possible mixture components from the model selection without observing their effect on the model posterior distribution.
On the RJMCMC level, a way to solve this problem would be to set a more informative (restrictive) model-prior distribution, favouring the solutions with fewer components.In the current implementation of the CU-MSDSp, the prior is set to flat [Chavis et al., 2021], a neutral/uninformative choice.We note that changing the prior could limit the applicability and generality of RJMCMC.It may even be a beneficial choice when many models are estimated simultaneously since the probability of each will likely be small from the start.Hence, the chance of the model with falsely supported components "dying" (the probability in the model posterior distribution approaching 0) should increase.Our examples did not show this behaviour, but it opens an interesting research topic for the future (cf.[van Havre et al., 2015]).

Rectified model posterior distribution
Considering the above, our approach to solving the occurrence of empty components is to form a rectified model posterior distribution whenever false components are present (with respect to the assignment rule).We do this as a post-processing step by marginalising the model posterior distribution on the number of non-empty components.Still, an equally good alternative would be considering all components with weights falling under the given threshold as 'false'.In practice, we do the marginalisation by transferring the amount of support the model with false components gained to the model with a matching number of true components.While not necessarily optimal nor theoretically justified, the benefit of this approach is no changes are required to the existing RJMCMC implementation.Also, it seemed to work quite well for the simulated data sets and the French mortality data where the empty components were observed.

Closing remarks
While our discussion mainly concerns the model posterior distribution for choosing the optimal number of mixture components, its use should not be limited.The topic, on the whole, is too broad for an in-depth review in the limits of this paper as, in a more general case, the interpretation of the model posterior distribution depends on the differences between the models in question.
We would also like to mention the approach of Guha et al. [2021].They introduce the Merge-Truncate-Merge algorithm, which, together with some postprocessing, supposedly solves the issue of false components for a mixture model.While their method can find the optimal number of components as is, their strategy could be used for merging any false components in the estimated Stan models before the RJMCMC step in the CU-MSDSp.This could be considered another strategy to obtain the rectified model posterior distribution.

Simulation study
This section contains results for the simulated data sets.We begin with an analysis of the TMixClust Toydata available with the R package TMixClust [Golumbeanu et al., 2019].More information on creating the data set can be found in [Golumbeanu, 2020].Here, we give a more in-depth outline of the process, followed by an analysis of our simulated data sets.

Data preparation and the models
We begin by loading the simulated time-series data provided in TMixClust (Figure 1).Stan data object was created, which contains the data, smoothing prior and values for other fixed parameters.The object is then written into a JSON file (example function is found in the R file).As for the models, we estimated three nested models An order restriction is assumed on the prior distribution of the initial value of each random walk process to avoid label switching, a standard approach in mixture modelling through mean.For this example, we assume the same level of smoothness for each of the mixture components.

Model estimation
The posterior distributions of parameters of individual models are estimated through the CU-MSDSp; for details, see Appendix 6.We used different lengths of MCMC chains.Typically, the models converge by setting 'NBURN'= 4000.Additionally, 4000-15000 MCMC samples were collected after the warm-up.

Assignment
The model is fully probabilistic at this point, and none of the individual time series is labelled.With the assignment, we can use the parameter information to cluster the observations.We show our approach to this but note that many approaches to using the MCMC chain are at least partially subjective.
The first and easiest approach is to use built-in R functions to calculate the posterior medians for all the model parameters.This is because the median is a more robust estimator for the chain than the mean for possibly skewed posterior distributions.We plot the estimated cluster centres (Figure 2a).These are the means of the random walk processes.We note that these are not mean values for the observed data but random numbers that are likely behind the observed data.In our fitted models, these are denoted by l1.1, l1.2, and so on and are collected to L1, L2,. . . .These can be plotted on the data to get an idea of how the clusters will be located, but also, it is quite easy to spot if something goes wrong (bad model fit).
Next, we apply the assignment estimation.We use the posterior estimated mixture weights and variance parameters for our function.The main part of the function consists of an exponential function, the argument of which we can put the distance function of choice.In our case, the distance function is the Euclidean distance to a cluster's centre weighted by the mixture components' estimated variance parameter.Applying this function to the individual time  series, we get a matrix of clusters' unnormalised probabilities, i.e., a prototype cluster membership matrix.These are not necessarily probabilities, but since the normalisation does not change their order, they can be used for assigning the observation to the most plausible cluster.The size of the clusters can be checked with table() function and plotted with ts.plot() or other similar function (Figure 2b).If the resulting assignment is favourable, we can stop here.

Thresholding
We may apply thresholding if the clusters are too big or noisy to conclude.Applying built-in functions to the prototype matrix is easily done in R. With ts.plot(), one can use trial-and-error to set the threshold until the clusters are visible or of the desired size.Since the prototype probabilities are unnormalised, we cannot treat the membership thresholds as probabilities (or credible intervals for the clusters).The constants needed for probability interpretation are generally hard to calculate (cf.Bayes factor).However, it is still analogous to the classic credible interval.For each cluster we threshold, we also create a 'junk' group of observations that are not strong members of the clusters.For extra information, we may iterate the RJMCMC to see if any interesting clusters are formed within the 'noise'.

Analysis of the model posterior distribution
Next, we may compare the results from the individual models against the model posterior distribution.Since some components were left empty with nested models (Figure 2b), the probability could be transferred to form the rectified model posterior distribution (Figure 3).

Comparison: TMixClust results
This subsection is about the results of applying TMixClust to the Toydata.Clusters found by TMixClust for K = 2, 3, 4 are shown in Figure 4, and the estimated mixture weights in Table 2.

Comparison of Random walk distributions
This short section is about the choice of the smoothing distribution.The model is Here, we assume each mixture component is normally distributed, but the underlying random walk z k is either Cauchy, Laplace, Normal, or Student's t distributed.The model in terms of hierarchical priors is

Mixture distributions comparison
In the following, we assume the latent processes follow AR(1)-based centres.
The mixture components are assumed to be either Cauchy, Laplace, Normal, or Student's t distributed.
The model is Here, we assume each mixture component is either Cauchy, Laplace, Normal, or Student's t distributed, but the underlying random walk z k follows an AR(1) process.The model in terms of hierarchical priors is

Single component
The simulated data is shown in Figure 7, the model posterior distribution and the rectified model posterior distribution in 8.The estimated mixture component centres and the resulting clusters are shown in 9a and 9b, respectively.The summary of the results is given in Table 3.  Figure 9: Estimated centres and cluster assignments given by the BELMM approach for the single-component data.

Two components
The simulated data is shown in Figure 10, the model posterior distribution and the rectified model posterior distribution in 11.The estimated mixture component centres and the resulting clusters are shown in 12a and 12b, respectively.The summary of the results is given in Table 4.    Figure 12: Estimated centres and cluster assignments given by BELMM approach for the two-component data.

Three components
The simulated data is shown in Figure 13, the model posterior distribution and the rectified model posterior distribution in 14.The estimated mixture component centres and the resulting clusters are shown in 15a and 15b, respectively.The summary of the results is given in Table 5.Table 5: Summary of the estimated and realised values of the mixture weights after assignment given by the BELMM approach for the three-component data.

Four components
The simulated data is shown in Figure 16, the model posterior distribution and the rectified model posterior distribution in 17.The estimated mixture component centres and the resulting clusters are shown in 18a and 18b, respectively.The summary of the results is given in Table 6.    Figure 18: Estimated centres and cluster assignments given by the BELMM approach for the four-component data.

Five components
The simulated data is shown in Figure 19, the model posterior distribution and the rectified model posterior distribution in 20.The estimated mixture component centres and the resulting clusters are shown in 21a and 21b, respectively.The summary of the results is given in Table 7.

French mortality
This section summarises the results from empirical analysis 3, the French mortality.The section is structured as follows: First, we consider Interpretation 1 of the data.We provide the results for the RJMCMC, an example Stan model, and the estimates for the mixture components and the assigned clusters, respectively.We give an example of the rectified model posterior distribution based on these.We show results for the slow versions of the Stan model (see Subsection 6.3).The results from package TMixClust accompany the above.Second, the above results are provided for Interpretation 2 of the data.The model is Here, we assume each mixture component is normally distributed, and the underlying random walk z k is normally distributed.The model in terms of hierarchical priors is

Interpretation 1: 203 profiles of length 101
The French mortality data and the model posterior distribution are shown in Figure 22a and 22b, respectively.The rectified model posterior distribution is in 25.The estimated mixture component centres and the resulting clusters for the seven component model are shown in 24a and 24b, respectively.The summary of the results is given in Table 8.For comparison, a different number of models affects the model posterior distribution.Figure 23 shows the model posterior distribution for the first five models.On the left is the model posterior distribution from the last 4000 RJMCMC samples.On the right is the model posterior distribution from the full 8000 samples RJMCMC chain.We see that the chain has converged fast.The estimated centres are shown in Figure 26.A summary of the results for the fast implementation is given in Table 9.

Comparison: TMixClust
To compare the clustering solutions and the estimated weights obtained with the BELMM approach, we provide some results for the package TMixClust.
Cluster results for 2-5 clusters are shown in Figure 27 and the summary of the results is given in Table 10.

Interpretation 2: 101 time series of length 203
The French mortality data and the model posterior distribution are shown in Figure 28a and 28b, respectively.The realised clusters for all eight models and the resulting clusters for the four and eight-component models are shown in 29a and 29b, respectively.The summary of the results is given in Table 11, and a summary for 5 models with fast implementation is given in Table 12 for comparison.

Comparison: TMixClust
To compare the clustering solutions and the estimated weights obtained with the BELMM approach, we provide some results for the package TMixClust.
Cluster results for 2-5 clusters are shown in Figure 30, and the summary of the results is given in Table 13.Table 13: TMixClust results for The French mortality data, Interpretation 2, summary.
5 Drosophila melanogaster embryogenesis timeseries data set

Analysis on full data
The Drosophila melanogaster data is shown in Figure 31.The silhouette index given by NbClust using the noted methods guiding the decision of the prior range and the model posterior distribution are shown in Figure 32a and 32b, respectively.The estimated mixture component centres and the resulting clusters for the eight models are shown in 33a and 33b, respectively.The summary of the results is given in Table 14.
For comparison, the estimated mixture component centres and resulting clusters are given for the 2-5 models estimated with the fast implementation in Figure 34a and 34b, respectively.Also, the summary for the 2-5 models is given in Table 15.The thresholded clusters and the rest of the genes are plotted in Figure 35a and 35b, and the summary of the sizes with different values for threshold are given in Table 16.Analysis on full data, 2-5 components     Table 16: Summary of thresholded results given by BELMM approach, Drosophila melanogaster embryogenesis data set.

Analysis on random subsets
Here, we show how the BELMM approach can be used to estimate the mixture component centres from random subsets of data.The sample of 1000 random genes from Drosophila melanogaster data is shown in Figure 36a, and a sample of 2000 random genes in Figure 37a.The model posterior distributions are given in Figure 36b and 37b, respectively (*It is not the same figure).The estimated mixture component centres and the resulting clusters for the eight models for the 1000 and 2000 gene samples are shown in Figure 38a and 38b, and 39a and 39b, respectively.Summaries of the results for both samples are given in Table 17 and 18         Subsets of 1000 and 2000, 2-5 models, no prior on π k The results from the fast and slow implementations are provided for comparison with the results from the eight models; The model posterior distributions are given in Figure 40a and 40b, respectively.The estimated mixture component centres and the resulting clusters for the four models for the 1000 and 2000 gene samples are shown in Figure 41 and 42, and 43 and 44, respectively.Summaries of the results for both samples are given in Table 19 and 21, respectively.For easier comparison, the results from the full data set are provided in Table 20 and 22.The estimated mixture component centres from all the four cases are plotted in one figure over the data in Figure 45.

Influence of starting values on the posterior estimates
To study the stability of results under random seeds, we ran our five-component model five times in parallel on a subset of 3000 genes using the R included script.We observed that the estimated mixture weights were consistent across the runs, the centres differing slightly in one of the models (Figure 46).6 Appendix: Software details 6.1 BELMM For our implementation of the BELMM approach and the tutorial, please refer to the GitHub page at https://github.com/ollisa/BELMM.Currently, the RJMCMC is dependent on Stan and CU-MSDSP, of which the latter requires the use of Linux (alternatively Windows subsystem for Linux) or Mac OS.Therefore, we have not made a self-contained R package version of the BELMM.
To give an idea of the system requirements, all the models were estimated using a four-core Intel i5-6600 system with 8 GB of RAM for the simulated data sets and for the real data sets with models 2-5 at a time.The WSL tutorial is run on a twelve-core Intel i7-12700 system (using 8 cores for all examples) with 32 GB of RAM.The highest memory usage we have seen during the compilation of the models (8 at a time) by the WSL is a little less than 10 GB, making the total system usage 14 GB with the operating system.If one uses the plotting option for our models in CU-MSDSp, the software can easily use more than 32 GB of RAM.We do not recommend using the plotting functionality for long time series, such as the French mortality data set, as it can crash the terminal.
To give an example of the runtime, we can consider the 2k sample of the Drosophila melanogaster data set.The total times spent for the estimation with the i5 system were 3624,9s for the fast and 17106,1s for the slow.In both cases, 3100s were used for running the RJMCMC.The difference between them is just the slower sampling during the estimation of the models.On the WSL with the i7 system, the fast implementation was done in 1018,3s, and the slow one took 9307,8s for the same data and models.With further optimisation for the model structure (splitting up the for-loops), the slow implementation with four extra models, now 8 in total, takes 2053s on the i7 system for the 2k sample and 7628,7s for the complete data set.

R
We assume the user is familiar with R software and R programming.Some notes on using Stan: Our implementation requires users to write models in the Stan programming language.Writing individual models is easy using R.For small changes, it is also possible to create copies of the files and make the changes using software like Notepad.

Stan
The model itself is composed of blocks.The 'Data block' contains information on the data, possible fixed prior values, and the time series data.Making data transformations in Stan using the 'Transformed data' block is also possible.Any sampled model parameters are written in the 'Parameters' block, and any variables dependent on the parameters can be created in the 'Transformed parameters' block for more efficient computation.Model dependencies, such as the prior distributions and their contribution to the Jacobian, are defined in the 'Model' block.Last, the 'Generated quantities' block can generate values based on the sampled parameters, which is optional but used to create the model's log-likelihood for RJMCMC.
Stan uses a JSON-like file format for including the data.In practice, this means writing the data to a list format in R and including the dimension of the data and other fixed values, such as the smoothing priors used in our implementation.Many nested models may be required to find the optimal number of components.Including one vector containing the number of components in the Stan data file is beneficial.This reduces the risk of making mistakes in the actual models if the model structure allows this, like in the case of nested models.This can be implemented by including the following changes to the 'data' the parameters later using R is also possible.Also, manual plotting is necessary if any empty components are found.
The chains on individual models are contained in the 'goldStandardChains' folder.The results from RJMCMC can be found in the 'modelSelection' folder.The run time is in the 'timings' file; any plotted figures are found in the 'pics' folder.

TMixClust
Quoting Golumbeanu [2020], "TMixClust is a soft-clustering method which employs mixed-effects models with nonparametric smoothing spline fitting and is able to robustly stratify genes by their complex time series patterns.The package has, besides the main clustering method, a set of functionalities assisting the user to visualise and assess the clustering results, and to choose the most optimal clustering solution".
Their model is where µ k is a cubic smoothing spline.The covariance matrix consists of the cluster-specific random effects θ k and normally distributed random errors θ, so that Σ k = I • θ k + 1 • θ.The model with a fixed number of components is estimated using the EM algorithm.At the moment, they do not implement any model selection functionality.
To overcome the EM algorithms' limitation of finding only the local maxima, the package can run the estimation procedure multiple times (e.g. 10, 20, . . . ) to find a solution with the highest likelihood.They can do clustering stability analysis by calculating the Rand index to measure how well the different runs agree with the best-fitting solution.However, for unknown reasons, their implementation in the function "analyse stability" does not work correctly for the French mortality data.

Software versions
Here, we list versions of some notable software.

Figure 2 :
Figure 2: Estimated centres of the mixture components plotted on the data and the resulting clusters given by the BELMM approach.Results from left to right for K = 2, 3, 4 respectively.

Figure 3 :
Figure 3: Original Model posterior distribution given by CU-MSDSp and the associated rectified model posterior distribution.
, where ψ is Cauchy, Laplace, Normal, or Student's t distribution.The estimated centres and the associated model posterior distribution are shown in Figure5.

Figure 5 :
Figure 5: Comparison of the random walk smoothing distributions: Cauchy, Laplace, Normal, and Student's t.
, where ψ is Cauchy, Laplace, Normal, or Student's t distribution.The degrees of freedom value we used was T − 1 = 5.The estimated centres and the associated model posterior distribution are shown in Figure6a, 6b, respectively.

Figure 7 :
Figure 7: Plot of individual time series from the simulated single-component data.

Figure 8 :
Figure 8: Original Model posterior distribution given by CU-MSDSp and the associated rectified model posterior distribution: Estimated from the singlecomponent data.

Figure 10 :
Figure 10: Plot of individual time series from the simulated two-component data.

Figure 11 :
Figure 11: Original Model posterior distribution given by CU-MSDSp and the associated rectified model posterior distribution: Estimated from the twocomponent data.

Figure 13 :
Figure 13: Plot of individual time series from the simulated three-component data.

Figure 14 :
Figure 14: Original Model posterior distribution given by CU-MSDSp and the associated rectified model posterior distribution: Estimated from the threecomponent data.

Figure 15 :
Figure 15: Estimated centres and cluster assignments given by BELMM approach for the three-component data.

Figure 16 :
Figure 16: Plot of individual time series from the simulated four-component data.

Figure 17 :
Figure 17: Original Model posterior distribution given by CU-MSDSp and the associated rectified model posterior distribution: Estimated from the fourcomponent data.

Figure 19 :
Figure 19: Plot of individual time series from the simulated five-component data.

Figure 20 :
Figure 20: Original Model posterior distribution given by CU-MSDSp and the associated rectified model posterior distribution: Estimated from the fivecomponent data.

Figure 21 :
Figure 21: Estimated centres and cluster assignments given by BELMM approach for the five-component data.
Figure 22: Log-transformed French mortality data and the model posterior distribution for the slow implementation.

Figure 23 :
Figure 23: The French mortality data: Model posterior distribution for five models with the slow implementation.

Figure 24 :
Figure 24: Estimates of the mixture components' centres for eight models with the slow implementation given by BELMM.Centres are on the top plot.The realised clusters for the seven-component model are on the bottom plot.

Figure 25 :
Figure 25: The rectified model posterior distribution for the slow implementation

Figure 27 :
Figure 27: Results given by TMixClust method for the French mortality data.

Figure 28 :
Figure 28: The French mortality, Interpretation 2. The data set and the model posterior distribution.

Figure 29 :
Figure 29: The French mortality, Interpretation 2. Estimated centres and assigned clusters given by BELMM approach.
(a) Prior range.(b) Model posterior distribution

Figure 32 :
Figure 32: Prior range for the number of components given by NbClust [Charrad et al., 2014] and Model posterior distribution given by CU-MSDSp, Drosophila melanogaster embryogenesis.

Figure 33 :
Figure 33: Drosophila melanogaster embryogenesis data set; Centres and clusters given by BELMM approach.

Figure 34 :
Figure 34: Drosophila melanogaster embryogenesis data set; Centres and clusters given by BELMM approach.

Figure 36 :
Figure 36: Sample of 1000 genes and the Model posterior distribution given by CU-MSDSp, Drosophila melanogaster embryogenesis.

Figure 37 :
Figure 37: Sample of 2000 genes and the Model posterior distribution given by CU-MSDSp, Drosophila melanogaster embryogenesis.

Figure 38 :
Figure 38: Results for the slow implementations of the model given by the BELMM approach.A sample of 1000, Drosophila melanogaster embryogenesis data set.

Figure 39 :
Figure 39: Results for the slow implementations of the model given by the BELMM approach.A sample of 2000, Drosophila melanogaster embryogenesis data set.

Figure 40 :
Figure 40: Model posterior distributions given by CU-MSDSp for fast and slow implementations of the model.A sample of 1000 and 2000, Drosophila melanogaster embryogenesis data set.

Figure 41 :
Figure 41: Results for the fast implementations of the model given by BELMM approach.A sample of 1000, Drosophila melanogaster embryogenesis data set.

Figure 42 :
Figure 42: Results for the slow implementations of the model given by the BELMM approach.A sample of 1000, Drosophila melanogaster embryogenesis data set.

Figure 43 :
Figure 43: Results for the fast implementations of the model given by the BELMM approach.A sample of 2000, Drosophila melanogaster embryogenesis data set.

Figure 44 :
Figure 44: Results for the slow implementations of the model given by the BELMM approach.A sample of 2000, Drosophila melanogaster embryogenesis data set.

Figure 45 :
Figure 45: Centres given by the BELMM approach, fast and slow implementations, summary, Drosophila melanogaster embryogenesis data set.

Figure 46 :
Figure 46: Summary of estimates given by BELMM, fast implementation, Drosophila melanogaster embryogenesis data set, sample of 3000.

Table 1 :
Summary of posterior estimated weights and realisations after the assignments given by BELMM approach, TMixClust Toydata

Table 4 :
Summary of the estimated and realised values of the mixture weights after assignment given by the BELMM approach for the two-component data.

Table 6 :
Summary of the estimated and realised values of the mixture weights after assignment given by the BELMM approach for the four-component data.

Table 7 :
Summary of the estimated and realised values of the mixture weights after assignment given by the BELMM approach for the five-component data.

Table 8 :
Summary of the seven and eight component results given by BELMM, French mortality, slow implementation.

Table 10 :
Summary of the results given by TMixClust method, French mortality data.

Table 11 :
The French mortality, Interpretation 2. Summary of the results for the eight models given by the BELMM approach.

Table 12 :
The French mortality, Interpretation 2. Summary of the results given by fast implementation of the BELMM approach.

Table 14 :
Summary of estimates and realised values given by the BELMM, Drosophila melanogaster embryogenesis, full data set.

Table 15 :
Summary of results given by the BELMM approach, Drosophila melanogaster embryogenesis data set.

Table 17 :
Summary of estimates given by BELMM, Drosophila melanogaster embryogenesis, sample of 1000.

Table 18 :
Summary of estimates given by BELMM, Drosophila melanogaster embryogenesis, sample of 2000.

Table 19 :
Summary of estimates for 2-5 models given by the BELMM approach, Drosophila melanogaster embryogenesis, sample of 1000.

Table 20 :
Summary of estimates for 2-5 models given by the BELMM approach, Drosophila melanogaster embryogenesis full data set.

Table 21 :
Summary of estimates for 2-5 models given by the BELMM approach, Drosophila melanogaster embryogenesis, sample of 2000.

Table 22 :
Summary of estimates for 2-5 models given by the BELMM approach, Drosophila melanogaster embryogenesis full data set.