Nunchaku: optimally partitioning data into piece-wise contiguous segments

Abstract Motivation When analyzing 1D time series, scientists are often interested in identifying regions where one variable depends linearly on the other. Typically, they use an ad hoc and therefore often subjective method to do so. Results Here, we develop a statistically rigorous, Bayesian approach to infer the optimal partitioning of a dataset not only into contiguous piece-wise linear segments, but also into contiguous segments described by linear combinations of arbitrary basis functions. We therefore present a general solution to the problem of identifying discontinuous change points. Focusing on microbial growth, we use the algorithm to find the range of optical density where this density is linearly proportional to the number of cells and to automatically find the regions of exponential growth for both Escherichia coli and Saccharomyces cerevisiae. For budding yeast, we consequently are able to infer the Monod constant for growth on fructose. Our algorithm lends itself to automation and high throughput studies, increases reproducibility, and should facilitate data analyses for a broad range of scientists. Availability and implementation The corresponding Python package, entitled Nunchaku, is available at PyPI: https://pypi.org/project/nunchaku.


Introduction
A common scientific problem is understanding the relationship between two variables.When the dependent variable, or some transformation of it, depends linearly on the independent variable, the underlying system linking the two often behaves more simply than generally.As a consequence, scientists commonly focus their efforts on identifying and understanding this linear regime.A well-known example is the growth of a population of cells.In log phase, when the logarithm of the number of cells increases linearly with time, the total mass of every intracellular component grows exponentially and the mass per cell is approximately constant.Such steady-state conditions regularize growth; metabolic fluxes are balanced; and physiology simplifies, generating behaviours controlled by only a handful of variables (Scott and Hwa 2023).
Biologists therefore often wish to determine when growth is in log phase.Historically the approach has been to plot the logarithm of a variable correlating with the number of cells, such as optical density (OD), against time and to identify a linear region by eye (Monod 1949).Today this subjective technique is still used, with one scientist's linear region not necessarily the same as another's.
A challenge to developing objective approaches is identifying a suitable nonlinear model with which to compare the linear one.There is no general way to describe all relationships that we may observe.With a mechanistic understanding, we might generate a nonlinear description, but such an understanding is often lacking and, anyhow, may obviate the need to find linear regimes.
Here, we circumvent this problem by inferring the piecewise linear description that best approximates an entire 1D time series.By doing so, we reframe the task to one of detecting change points-time points where the process generating the time series changes, a well-studied problem (Stephens 1994) with an established frequentist solution (Baranowski et al. 2019).We use a Bayesian approach, complementing others (Hutter 2007, Papastamoulis et al. 2019), and generalize by allowing each segment of data to be described by a linear combination of arbitrary basis functions, with straight lines being but one example.For a given set of basis functions, we compare the evidence for every possible piece-wise linear combination, found by marginalizing over all possible fits to all possible contiguous subdivisions of the data.For linear segments and for the optimal choice of segments, we provide statistics for each segment, allowing users to select straightforwardly the segment or segments of most interest.To illustrate our algorithm, we primarily discuss two examples: determining the range of OD of a liquid culture where the OD depends linearly on the number of cells and finding the exponential phases of microbial growth curves.

Inferring contiguous regions using model comparison
Given 1D time-series data and a set of K basis functions, we wish to divide the data into the group of contiguous segments that is best characterized by piece-wise linear combinations of the basis functions.Irrespective of the data's behaviour, we will always find such a group.Our approach answers two questions: how many piece-wise contiguous segments best describe the data given the basis functions and where the optimal segment boundaries lie.
Let us assume that we have observations, ðx j ; y ðrÞ j Þ, where j runs from 1 to N and the x j are in ascending order; r indexes the N r replicates if any.We denote these observations collectively as D.
First, we consider whether we should divide the data into M or M 0 segments, using Bayesian model comparison (MacKay 2003).Assuming equal prior probabilities, PðMÞ ¼ PðM 0 Þ, we write the Bayes' factor as: and therefore we should determine the evidence PðDjMÞ for each M.
The evidence is a marginal likelihood.For M contiguous segments, there are M-1 unknown boundary points, which we denote as n ðn 1 ; . . .; n MÀ1 Þ with n i < n iþ1 .These points are integers and index an x j .The two remaining boundaries are the indices for the first and last x values: 1 and N. We assume that each segment contains a minimal number of data points ' min , so that n iþ1 !n i þ ' min .The choice of ' min depends on the type and number of basis functions: in general, ' min !K.
The evidence is a sum over all potential n: where we use that any permissible n i is equally likely as any other to write the prior PðnjMÞ as a function of N, M, and ' min .Specifically, this bounded uniform prior is the reciprocal of the number of possible n, which satisfy for a given M and ' min .We therefore have: Second, for a given M and n, we fit the data to M different linear combinations of the basis functions, one for each segment, with each combination independent of the other.The linear combination ending near the data points indexed by n i and n iþ1 depends only on the data indexed by the indices n i þ 1 and n iþ1 inclusively, denoted D i , and this data does not determine any other linear combination.Therefore, mathematically, where PðD i jn i þ 1; n iþ1 Þ is the likelihood of a linear combination of the basis functions describing the data indexed by n i þ 1 to n iþ1 .
2.1.1Finding PðDjn ; MÞ For each segment of the data, we consider the K basis functions, each individually denoted / k ðxÞ and collectively /ðxÞ, and correspondingly K coefficients, each denoted m k .If fitting lines, we have two basis functions: / 1 ¼ 1 and / 2 ¼ x, and two m k where m 1 determines the line's y-intercept and m 2 its gradient.We then set ' min ¼ 3 so that there are sufficient data points in each segment to define a line.We let Pðy j jx j ; mÞ describe how a data point y j at x j deviates from the linear combination of basis functions and assume that this deviation is independent of the deviations of other data points.
For the ith segment, we then have assuming the prior PðmÞ is a constant, with each m k independently and uniformly distributed in some bounded region so that for m i 2 ½m min i ; m max 2.1.2Marginalizing PðDjn; MÞ Using Equation (5), we factorize the sum in Equation (2): and use the method of variable elimination (Zhang and Poole 1996) to evaluate these sums.First we perform the rightmost one, over n MÀ1 , to generate a function of n MÀ2 .We then perform the next rightmost sum, over n MÀ2 , of this function and the next term in Equation ( 8), which generates a function of n MÀ3 .We repeat this process until we reach the leftmost sum over n 1 , enabling OðMN 2 Þ operations in total instead of OðN M Þ.We evaluate Equation (4) similarly.
All that remains is to determine PðD i jn i þ 1; n iþ1 Þ so that we can find PðDjMÞ via Equation (2) and Equation (8).

Finding
To proceed, we assume that Pðy j jx j ; mÞ is a normal distribution with mean /ðx j Þ T m, or equivalently P k m k / k ðx j Þ, and a standard deviation r j .If we know the r j , e.g. by approximating each by the corresponding measurement error, then Equation ( 6), the likelihood of a linear combination describing the data indexed by n i þ 1 to n iþ1 , becomes To evaluate the integral, we extend it to infinite range for all m ka suitable approximation because we expect the integrand to be strongly peaked at the most likely values of each m k (MacKay 2003).We can then perform the integration analytically.
Consider data with a single replicate.Define ' i ¼ n iþ1 À n i to be the number of x values in the ith segment and z ðiÞ to be a vector with components y j =r j , with the superscript i used to denote the ith segment.Let UðXÞ be the K Â ' i matrix with components U kj ¼ / k ðx j Þ=r j , and further defining The matrix A ðiÞ is a symmetric K Â K matrix, which is invertible when the basis functions / k are linearly independent and when ' i !K. Then standard algebra gives where Using Equation ( 11) and the results for integrating multivariate Gaussian distributions (MacKay 2003), we have that If we are fitting straight lines with K ¼ 2 and / 1 ¼ 1 and / 2 ¼ x, then it is useful to define (Hinrichsen et al. 2017) with j running from n i þ 1 to n iþ1 .Using these definitions, and the integral becomes ð2pÞð4T 2 T 3 À T 2 6 Þ À 1 2 e ÀU ðiÞ .With more than one replicate, z runs over all y in all replicates, with the replicates arranged contiguously, and is of length N r ' i ; U has rows of length N r ' i with x niþ1 to x niþ1 repeated N r times in each row to match the corresponding y values.For the linear case, the sums in Equation ( 14) are over both j and the number of replicates, so that T 1 , e.g., becomes Returning to Equation (9), we find with the help of Equation ( 13).For this approximation to be valid, we require that the strongly peaked region in m space is within the a priori range for m.The area under the integrand in Equation ( 13) is proportional to the square root of detA ðiÞ , and the prior range of m must be large enough to contain this area.Using Equation ( 7 8).The posterior variance, Var½n i , determines the error in this estimate, which we find similarly.
2.1.5Finding PðDjMÞ for unknown measurement error If the r j are unknown, we assume the same constant r for all j with a uniform prior probability between ½r min ; r max (Gelman 2006).Equation ( 2 The constant PðrÞ ¼ 1=ðr max À r min Þ will cancel in Equation (1) when we compare the evidence for different M.
Using the equivalent of Equations (9 and 13), we find that where we now explicitly follow r and so set the r j in Equation ( 10) to unity, making z i ¼ y i and Similarly for the linear case, the r j become unity in Equation ( 14).Consequently, Although with Equation ( 22) it is possible to approximate analytically the integral over r in Equation ( 20) by extending the range of the integrand to ð0; 1Þ, the resulting expression prevents us from summing over n using variable elimination.Instead, we swap the sum and the integral to write and numerically evaluate, using variable elimination to sum over n in Equation ( 23) for each r chosen by the integration algorithm.
We find the expected boundary points via Equation ( 19), again numerically integrating over r.
Performing the integration: To stabilize the numerical integration, we scale the integrand of Equation ( 23) by its value at the most likely value of r, making the integrand nearly always less than one and preventing overflow.We use expectation-maximization (EM) to estimate the most likely r for a given M. The EM algorithm finds the r that maximizes PðDjM; rÞ (Bishop 2006).We guess a value of r, r o say, and find PðnjD; r o ; MÞ from Equation (18).To update r o , we maximize Qðr; r o Þ with respect to r, where with the expectations taken over PðnjD; M; r o Þ. Expanding Equation (24) using Equation ( 22), there are only two terms that depend on r, and we can differentiate to find the updated r ¼ r n : We use the equivalent of Equation ( 19) with r ¼ r o to evaluate these expectations and iterate until the value of r converges.

Implementation
For basis functions that generate lines, we compare the different linear segments by calculating the gradient, intercept, and the coefficient of determination R 2 of the line maximizing the likelihood for each segment.The user can then select a desired segment, such as the one with the largest gradient.
The algorithm requires the a priori bounded region of m in Equation ( 7).Again specializing to straight lines, the prior specifies the range of the intercept m 1 and the gradient m 2 : ½m min 1 ; m max 1 and ½m min 2 ; m max 2 .The user can either provide both ranges or only the range of m 2 or give the maximal range of y possible in the experiment, ½y min ; y max .If the user provides only the range of m 2 , we estimate m min 1 as minðÀm max 2 x max ; m min 2 x min Þ and m max 1 as maxðÀm min 2 x max ; m max 2 x min Þ.If the user provides the range of y, we estimate the range of m 2 as ½Àg max ; g max , with g max ¼ ðy max À y min Þ=Dx min and Dx min being the smallest difference between two neighbouring x values.

Availability
We coded the algorithm as a Python package available at https://pypi.org/project/nunchakuand via pip.We have also embedded nunchaku into our omniplate software for analyzing plate-reader data (Montaño-Gutierrez et al. 2022).

Generating and testing with synthetic data
To test our method, we generated a piece-wise linear function f(x) with 1 M 10 continuous linear segments, each having between 10 and 50 data points and with a unit distance, Dx ¼ 1, between data points.We sampled h, the angle between each segment and the x-axis, from a uniform distribution on the interval ½Àtan À1 ð20Þ; tan À1 ð20Þ, so that the gradient, tan h, lies between ½À20; 20.Furthermore we ensured that the difference in h between neighbouring segments is larger than a fixed minimum, h 0 .We added Gaussian noise, e $ Normalð0; r 2 Þ, to give three replicates of y ¼ f ðxÞ þ e.We generated 3600 synthetic datasets in total, a combination of 200 different piece-wise linear functions f(x), three values of h 0 , and six values of r.In Figs 1 and 2, h 0 ¼ 10 .

Experimental methods
We used a prototrophic strain of Saccharomyces cerevisiae (FY4), precultured in synthetic complete (SC) medium with 2% (w/v) sodium pyruvate in a 30 C shaking incubator at 180 rpm for two days.Before the experiment, we diluted the cells 6-fold and let them grow for six hours.After washing the cells twice with fresh minimal media (Verduyn et al. 1992), we inoculated them into minimal media with different concentrations of fructose on a 96-well microplate.The liquid volume of each well was 200 ll.
For Escherichia coli, we precultured cells in 3 ml liquid Luria broth (LB) with one colony from a fresh plate and grew aerobically to log phase (6 h) at 37 C with 250 rpm shaking.
We then inoculated 3 ll culture into 147 ll fresh LB medium per well on a 96-well microplate.
We used either a Tecan Infinite M200 Pro or F200 plate reader at 30 C for S.cerevisiae and 37 C for E.coli with linear shaking at amplitude 6 mm.Measurements of absorbance at 600 nm, OD 600 , were taken every 10 min.

Fitting Monod's equation
After estimating the specific growth rate k at each concentration of fructose s, we have a dataset D fðk i ; s i Þg with 38 data points.We use Bayesian inference to estimate the Example synthetic datasets with the ground truth in blue (small circles) and the triplicate data in light grey.The large red circles are the predicted boundaries of each linear segment with the best-fit line in red.Left: with a measurement error of 0.25, the predictions overlap the data; Right: with a measurement error of 8, the predictions miss some segments, which the noise obscures.As a prior, we specify only that the gradient of each line lies between ½À25; 25.For this data, a measurement error of 0.25 is 0.5% of the mean of y and an error of 8 is almost 15%.(B) The algorithm underestimates the number of linear segments only once the magnitude of the measurement noise becomes sufficiently high.The actual number of segments is M; the estimated number is M .
Figure 2. Nunchaku performs as well as or better than the NOT algorithm (Baranowski et al. 2019).This algorithm only supports input of one y value for each x value: we therefore input either one replicate or the mean of three replicates.The data are generated similarly to that in Fig. 1 (Section 2).As a prior for nunchaku, we specify that the gradient of each line lies between ½À25; 25. (A) The root mean squared error (RMSE) between the ground truth and the best-fit lines.(B) The difference between the predicted number of segments M and the ground truth M (left) and the percentage of correct predictions of M with M ¼ M (right).

Nunchaku
constants k max and K M of Monod's equation.Assuming a Gaussian measurement error of k max with a standard deviation r and independent measurements, the likelihood To marginalize over r, we assume PðrÞ / 1=r, so that We further assume that the prior Pðk max ; K M Þ is uniform, and so the posterior probability k max and K M is proportional to the likelihood, Equation ( 27).We therefore maximize the likelihood with respect to k max and K M using the BFGS algorithm.We estimate the errors in these inferences using the diagonal elements of the Hessian matrix Àrr log PðDjk max ; K M Þ evaluated at the maximum of the likelihood (MacKay 2003).

Approximating data with a piece-wise linear model
Although our goal is to allow scientists to choose objectively the segment of their data that is 'most' linear, we adopt a general methodology and allow the data to be described by linear combinations of arbitrary basis functions.For straight lines, there are two basis functions, / 1 ðxÞ ¼ 1 and / 2 ðxÞ ¼ x, but datasets may require higher order polynomials or even Gaussian or sigmoid functions (Bishop 2006).For a 1D time series and a given set of basis functions, we will infer the optimal piece-wise description-the number of contiguous segments into which we should divide the data, where the boundaries of each of those segments should be, and the best-fit linear combination of basis functions for each segment.Deciding which of these segments is then most appropriate for the task in hand is unavoidably subjective.It is straightforward, however, to compare different segments by comparing properties of their best-fit linear combinations.For lines, these properties include their gradients and R 2 value-how much of the variance of the dependent variable is explained by the independent one (Moses 2017).
We use a Bayesian approach to infer the best piece-wise description and assume only that the data of each segment is normally distributed around a linear combination of the basis functions (Section 2).To proceed analytically we marginalize over all coefficients constituting the linear combination for each segment using a mild approximation and choose the optimal number of segments by comparing marginal likelihoods.The data points bounding each segment are then estimated by the means of their posterior distribution.We consider the case with known measurement error separately from an unknown one and call our algorithm nunchaku.

Verifying our approach
To verify our methodology (Section 2), we first focused on identifying linear regions.We generated synthetic data using piece-wise linear functions, where we know the number of segments and their gradients, added Gaussian noise, and then inferred from this data the optimal number of segments and the gradients of the best-fit lines, assuming that we know the magnitude of the measurement noise (Fig. 1A).
The algorithm predicts correctly the number of segments when the noise in the data is sufficiently low (Fig. 1B and Supplementary Fig. S1), but underestimates this number when the noise is larger.Such noise tends to blur two neighbouring segments so they seem one, rather than cause a single segment to appear as two or more.Similarly, if we decrease the angle between neighbouring segments, the noise is more likely to make two neighbouring segments appear contiguous, and the algorithm's accuracy falls (Supplementary Fig. S1).
We confirmed that the algorithm also correctly predicts the underlying piece-wise linear functions, and hence the gradient of the lines generating the data in the segments (Supplementary Fig. S1).As expected, this accuracy falls too with more noisy data.
When the measurement error is unknown, the results are similar (Supplementary Fig. S1), but the algorithm is slower because we numerically integrate over all possible magnitudes of this measurement error.We also confirmed that the algorithm's performance is robust to broad choices of the prior distribution (Supplementary Fig. S2).
We next compared our methodology to the Narrowest-Over-Threshold (NOT) algorithm (Baranowski et al. 2019), a state-of-the-art frequentist approach.Whether we consider the root mean square error between the best-fit lines and the ground truth (Fig. 2A) or the predicted number of segments (Fig. 2B), our algorithm consistently performs as well as or better (see also Supplementary Fig. S3).This greater accuracy however comes at the expense of speed: the NOT algorithm is faster than our implementation of nunchaku.
Finally, we demonstrated that nunchaku works with other basis functions, including constant functions, third-order polynomials, and sines (Supplementary Fig. S4).

Application 1: finding the range of OD that increases linearly with cell number
The OD of a microbial culture increases linearly with the number of cells only for sufficiently small ODs.At higher ODs, the light from the spectrophotometer may scatter off multiple cells, and the relationship between OD and the number of cells becomes nonlinear (Stevenson et al. 2016).To calibrate OD measurements, researchers often serially dilute a dense culture of microbes and measure the relationship between the OD and the dilution factor (Warringer andBlomberg 2003, Stevenson et al. 2016) (Fig. 3A).Interpolating this curve, we can convert an OD measurement to the corresponding dilution factor and so correct for any nonlinearity between the OD and cell numbers.
Dilution factors, however, are not intuitive units, and it is useful to identify the range of ODs over which there is a linear relationship with cell numbers.Not only is this range itself important, but by using the ratio of the maximum of the range to the corresponding dilution factor, we can re-scale the dilution factors back into ODs.
We used the nunchaku algorithm to identify the linear range, using basis functions that generate straight lines and an unknown measurement error.Two linear segments are optimal, and the one of interest, where OD is proportional to the number of cells, is the segment beginning at the smallest OD.This segment also has the highest coefficient of determination R 2 .Its maximal OD is 0.66 for a relative cell number of 0.25 (Fig. 3A), and we should therefore multiply the dilution factors by 0.66/0.25,or 2.6, to convert back to ODs.

Application 2: identifying the log phase of microbial growth
Microbes are most often studied when growing exponentially, with the logðODÞ of the culture increasing linearly with time (Monod 1949).Researchers identify this log-phase growth from microbial growth curves.
To detect log phase automatically, we applied nunchaku, again with basis functions generating lines, to OD measurements of E.coli (Fig. 3B).Partitioning the data into six segments is optimal, and the segment whose best-fit line has the highest gradient-the greatest specific growth rate-corresponds to exponential growth.
Monod noticed an empirical relationship between the nutrient concentration and the specific growth rate of microbes in log phase (Monod 1949).Denoting this growth rate as k, the maximal specific growth rate as k max , and the nutrient concentration as s, his equation becomes where K M is now called the Monod constant.To estimate k max and K M , researchers systematically vary the concentration of the carbon source and identify the log phase and the corresponding gradient for each growth curve.
Here, we use the nunchaku algorithm to select data to estimate k and K M for S.cerevisiae growing on fructose (Section 2), from 38 growth curves measured with plate readers (Fig. 3C).Each biological replicate has two technical replicates.

Discussion
Determining where data are best described by a line is a problem familiar to most scientists.We present a statistically rigorous solution, which we generalize by considering linear combinations of arbitrary basis functions.Our methodology is Bayesian and similar in approach to earlier work that focused on piece-wise constant functions (Hutter 2007).
Like all Bayesian inference, our algorithm depends on prior information: the bounds on the coefficients constituting the linear combination of basis functions.For basis functions generating lines, these bounds describe the range of the gradients and intercepts of all possible lines within a segment.The optimal number of segments will depend on this prior if the amount of data is sufficiently small, as it should (MacKay 2003).In practice, however, users interested in lines need specify only one prior range with the other inferred (Section 2), and we see that although a wide prior favours fewer segments, a single segment is robustly assigned to sections of the data that appear linear.
Our method makes two assumptions about how the data deviate from a linear combination of basis functions.We assume these deviations are independent and we assume that each deviation obeys a normal distribution.For some data, a distribution with a purely nonnegative support, such as a log normal, may be more appropriate.Although we can use such a distribution in principle, in practice some of the steps that we performed The calibration curve for plate-reader measurements of the OD of S.cerevisiae, found by diluting an overnight culture in 2% fructose, is nonlinear (blue dots).There are three replicate measurements for each dilution factor.Our algorithm identifies two linear segments (boundaries marked as circles).Lighter orange circles bound the segment with the highest R 2 .We specify the likely maximal range of OD as our prior: ½0; 2. Inset: the logarithm of the model evidence for the number of segments.(B) Identifying contiguous linear segments in the logarithm of the OD of growing E.coli cells as a function of time allows us to identify automatically the region of exponential growth.We show the mean of four replicate measurements (blue) with twice their standard deviation shaded.Circles denote the boundaries of linear segments; orange circles bound the segment with the best-fit line with highest gradient and so highest specific growth rate.The average specific growth rate over this segment is 1.5 h -1 .Inset: the logarithm of the model evidence for the number of segments.(C) With our algorithm, we can automatically identify the region of exponential growth in multiple datasets, here 38, to reveal growth laws such as Monod's equation.We plot the specific growth rate in log phase for S.cerevisiae as a function of the concentration of fructose, with the solid line a fit of Monod's equation: k max ¼ 0:42260:006 h -1 and K M ¼ 0:02660:002% (w/v).The shaded area shows the 95% confidence interval.Inset: three example growth curves with dots marking the region of exponential growth, identified as the segment with the highest gradient.For panels (B) and (C), we specify a prior on the range of the gradient: ½0; 5 h -1 .

Nunchaku
analytically would have to become numerical.Further, if nothing is known a priori about these deviations, we assume that their standard deviation is identical for all time points.Our algorithm would work too if the standard deviations vary but are proportional to a known function of x j and y j .
Our work adds to existing algorithms for detecting change points in time series, including those aimed at analyzing microbial growth (Papastamoulis et al. 2019).We have simplified this problem by considering change points to occur only at data points and by imposing no continuity between the functions underlying the data for each segment.These simplifications are not restrictive for our task of finding one particular segment of interest.Identifying change points more generally typically requires Markov chain Monte Carlo methods (Stephens 1994, Papastamoulis et al. 2019).
The nunchaku algorithm by using enumeration is robust and lends itself to automation, facilitating high throughput studies.It should both ease and increase the reproducibility of data analyses for a wide range of scientists.

Figure 1 .
Figure1.The nunchaku algorithm correctly predicts the number of linear segments in synthetic data when the measurement noise is not too high.(A) Example synthetic datasets with the ground truth in blue (small circles) and the triplicate data in light grey.The large red circles are the predicted boundaries of each linear segment with the best-fit line in red.Left: with a measurement error of 0.25, the predictions overlap the data; Right: with a measurement error of 8, the predictions miss some segments, which the noise obscures.As a prior, we specify only that the gradient of each line lies between ½À25; 25.For this data, a measurement error of 0.25 is 0.5% of the mean of y and an error of 8 is almost 15%.(B) The algorithm underestimates the number of linear segments only once the magnitude of the measurement noise becomes sufficiently high.The actual number of segments is M; the estimated number is M .

Figure 3 .
Figure3.The nunchaku algorithm gives intuitive results when applied to biological data.(A) The calibration curve for plate-reader measurements of the OD of S.cerevisiae, found by diluting an overnight culture in 2% fructose, is nonlinear (blue dots).There are three replicate measurements for each dilution factor.Our algorithm identifies two linear segments (boundaries marked as circles).Lighter orange circles bound the segment with the highest R 2 .We specify the likely maximal range of OD as our prior: ½0; 2. Inset: the logarithm of the model evidence for the number of segments.(B) Identifying contiguous linear segments in the logarithm of the OD of growing E.coli cells as a function of time allows us to identify automatically the region of exponential growth.We show the mean of four replicate measurements (blue) with twice their standard deviation shaded.Circles denote the boundaries of linear segments; orange circles bound the segment with the best-fit line with highest gradient and so highest specific growth rate.The average specific growth rate over this segment is 1.5 h -1 .Inset: the logarithm of the model evidence for the number of segments.(C) With our algorithm, we can automatically identify the region of exponential growth in multiple datasets, here 38, to reveal growth laws such as Monod's equation.We plot the specific growth rate in log phase for S.cerevisiae as a function of the concentration of fructose, with the solid line a fit of Monod's equation: k max ¼ 0:42260:006 h -1 and K M ¼ 0:02660:002% (w/v).The shaded area shows the 95% confidence interval.Inset: three example growth curves with dots marking the region of exponential growth, identified as the segment with the highest gradient.For panels (B) and (C), we specify a prior on the range of the gradient: ½0; 5 h -1 .