Designing More Efficient Preclinical Experiments: A Simulation Study in Chemotherapy-Induced Myelosupression

A new more efficient preclinical study design (referred to as a compact design) is proposed that removes the need for satellite animals for the collection of toxicokinetic (TK) data by sampling from the main study animals, taking no more than one sample in 24 h to build up a full profile over the course of the study. The compact design’s performance was tested with a simulation study, using an example of chemotherapy-induced myelosupression in rats. Data sets were simulated from a model based on available data, following both the compact design and a traditional design using satellite animals, with 100 studies being simulated for each. The effect of the compact design on parameter and variance estimates for the TK and neutrophil models were investigated, as well as the potential effect of interoccasion variability (IOV). The compact design performed equally as well as the traditional design, and had little impact on parameter or variation estimates, indicating that it would be a suitable alternative to traditional satellite designs while reducing the number of animals required. When IOV was present but not accounted for during the TK analysis some parameter estimates were biased and interindividual variation and residual errors inflated; this was reduced by allowing for IOV in the analysis. Using the compact design removes the need for a satellite group, reducing the number of animals required, without affecting the ability to model the data. If large IOV is suspected, caution should be exercised to avoid parameter estimation bias, and inflation of variability and residual error.

Currently many preclinical studies rely on a separate group of animals (referred to as satellite animals) in order to characterize the toxicokinetics (TK) of a compound. These animals are treated per-protocol, but do not undergo the same pharmacodynamic (PD) assessments as the main study animals. One reason for this is that the taking of additional samples required to estimate the TK can lead to physiological or behavioral differences, which could introduce bias in the PD endpoint (Nedelman et al., 1993;Viberg et al., 2012), impacting the integrity of the study. In the example of neutrophil counts, this could take the form of stress, induced by the taking of extra samples, which can cause a decrease in white blood cell count (Mahl et al., 2000;Zeller et al., 1998).
A new innovative design is proposed (referred to as the compact design) for use in preclinical studies with PD endpoints that span multiple dosing occasions, which is tested with a simulation study of chemotherapy-induced myelosupression in rats. It removes the need for satellite animals by allowing the TK to be characterized from the main study animals. The TK samples are taken over multiple days simultaneously with the PD samples, with the aim of reducing stress. The sampling times following dose of the drug remain the same as in the satellite group, allowing a full TK profile to be built up over the course of the study, but only 1 sample is taken on each of the days that PD sampling was scheduled. Meaning the days on which samples are taken are determined by the requirements of the PD, and the time of the sample on that day is determined by the requirements of the TK. This allows a reduction in the number of animals required, in keeping with both the reduction and refinement of the 3Rs principles (Russell and Burch, 1959), as well as being more cost-effective and quicker to run. The 3Rs principles were developed as a framework for treating animals in research humanely, the first principle, replacement, refers to methods that replace the use of animals in experiments, the second, reduction, aims to minimize the number of animals required and the final principle, refinement, refers to the improvement of animal welfare. The design could be further refined by optimizing the TK sampling times and the PD sampling days to make the design more efficient which may allow even fewer animals to be used and further reduce the sampling burden on each animal (Aarons and Ogungbenro, 2010;Bazzoli et al., 2010).
Further, measurement of TK in all animals allows individual parameter estimates to be used when fitting a PD model, instead of relying on population values. Using population values ignores the interindividual variation (IIV) in the TK model which will inflate the variability estimated in the PD parameters (Zhang et al., 2003). Another previous study comparing sequential methods for fitting PD models found that using individual parameter estimates consistently performed better than using population values only, in terms of bias in parameter estimates (Collins, 2013).
Other similar designs have previously been proposed that take 1 or 2 TK samples per day over the course of a study to build up a profile (Nedelman et al., 1993;van Bree et al., 1994;Viberg et al., 2012) and these have been found to be successful when tested on small samples of animals. When using this approach, precautions must be taken to ensure that the volume being sampled falls within current guidelines, such as the European good practice guide (Diehl et al., 2001) which gives maximum sampling volumes over time, with the volume required being dependent on the assay used.
There are 2 main questions that the simulation study aims to address. The first is whether the new design allows the model parameters to be as well estimated as when using a traditional design, despite the change in sampling schedule and the reduction in the overall number of animals. The second is whether by splitting the TK samples over a number of days, interoccasion variability (IOV) could affect parameter estimation. IOV describes variation which occurs within an individual across different dosing occasions, the sources of which are often unexplained. It has been shown that IOV can impact model parameter estimation when ignored, especially when higher than IIV, but this can be avoided by including IOV in the analysis (Karlsson and Sheiner, 1993). The total amount of variation in the data will be preserved, meaning if IOV is ignored during analysis, this variation will instead be attributed to the IIV or residual error, which will become inflated (Ahn and French, 2010;Karlsson and Sheiner, 1993;Laporte-Simitsidis et al., 2000). Misspecification of the variance structure is particularly important if the model is to be used for future simulations (Holford et al., 2000;Mould and Upton, 2013).

MATERIALS AND METHODS
Development of neutrophil model. Three data sets of absolute neutrophil counts (ANC) were available for the selection and fitting of a model. In each study the investigational cytotoxic was administered orally to rats at various doses and dose intervals, with follow-up to 21 days. In total, data from 136 rats (104 male, 32 female) was available, with doses ranging from 5 to 200 mg/kg and treatment lasting from 1 to 14 days. One study included 21 satellite animals, some of which were sampled following a single dose, and some following multiple doses. No TK samples were taken in the other 2 studies. The Friberg neutrophil model (Friberg et al., 2002) was chosen to describe the ANC data, which uses both system-and drugrelated parameters (Figure 1).
The proliferative compartment represents the proliferative cells such as stem cells and progenitor cells, followed by transit compartments that represent the maturation of cells and finally the circulatory compartment where the count of neutrophils are observed. Circ 0 is the baseline level of neutrophils, and MTT is the mean transit time (MTT ¼ (n þ 1) / k TR where n is the number of transit compartments), which represents the mean maturation time of the neutrophils. E Drug describes the effect of the drug on the neutrophil counts as an inhibition on the proliferation rate. The differential equations are given below (equations 1-5). dProl=dt ¼ k Prol : Prol : ð1 À E Drug Þ : ðCirc 0 =CircÞ c À k TR : Prol FIG. 1. Description of the semiphysiological Friberg model of myelosuppression following chemotherapy, which has been adapted from (Friberg et al., 2002). The k PROL rate parameter describes the production of new proliferative cells, while the k TR rate constant relates to the movement of proliferative cells through transit compartments to the blood where they are observed, kCIRC is then the rate constant describing the removal of neutrophils from the blood. Circ0 is the concentration of neutrophils at baseline, and Circ is the observed absolute neutrophil count (ANC), which together describe the feedback mechanism. EDrug is a function of the drug concentration which describes the effect of the drug on the rate at which new neutrophils are created.
Where Prol are the proliferative cells and Circ are the neutrophils circulating in the blood. Prol(0)¼Transit1(0)¼Transit2(0)¼ Transit3(0)¼Circ(0)¼Circ 0 . Data from all 3 studies were combined for model selection and fitting. The TK and neutrophil models were fitted to the data sequentially due to the high computation times associated with simultaneous analysis ($5 times longer when fitting this model to this data). Individual TK parameters were used where available to fit the neutrophil model, otherwise population values were used.
The first-order conditional estimation method and subroutine ADVAN 13 in nonlinear mixed-effects (NLME) modeling software NONMEM version 7.3 (Beal et al., 2009) was used for all analyses. Fixed effects (typical parameter values) and random effects (IIV and residual error) were estimated. IIV was assumed lognormally distributed. Both additional and muliplicative residual error were tested. The effect of sex on all parameters was investigated. Linear, log-linear and E MAX models were tested as the function providing the link from the TK to the neutrophil model. The number of transit compartments that best described the data were also investigated. All standard errors were confirmed by bootstrapping using Perl-speaks-NONMEM (PsN) (Lindbom et al., 2004) with 1000 replications. A visual predictive check was carried out to evaluate the performance of the model.
Performance of compact design. In order to investigate the performance of the compact design in terms of accurately recovering parameter estimates, data from the 2 different study designs were simulated using the final population parameter values estimated during the modeling process above. The simulated satellite design was based on the original dose-range finding study which included 76 rats in total, with satellite animals ($2 for every 3 in the main study group) to estimate the TK, with 5 TK samples being taken in the 24 h following the first dose. In the compact design, the days on which samples were taken were chosen to be the same as those in the main study group of the original study (baseline and days 2, 4, 8, 12, and 15), the times postdose were then chosen to be the same as those used in the satellite group (0.5, 2, 4, 8, and 24 h), with only 1 TK sample being taken each day, giving a whole TK profile over the course of the study. This compact design reduces the number of animals needed from 76 to 48. For each of the 2 designs 100 data sets were simulated and the same model was then fitted to each simulated data set. The 100 sets of parameters for each design were then summarized, and compared to the values used in the simulation.
Impact of interoccasion variability. Splitting the TK samples over multiple days introduces the potential for IOV to bias the TK parameter estimates and the estimation of IIV and residual error; although an advantage is IOV can be assessed with this design. The IOV could not be estimated in the original data as samples from multiple occasions were only available in a limited number of animals. Instead a similar approach was employed to a previous study which assessed the impact of different combinations of IIV and IOV values (Karlsson and Sheiner, 1993). The IIV was simulated at 2 levels, low (32%) and high (55%), on both clearance and volume parameters. Three levels were simulated for IOV; zero, acting as a control, and the same low and high values as used for IIV. IOV was simulated on the clearance parameters only, as in a previous study using the same neutrophil model . The impact of IOV on the neutrophil model was not investigated because the samples were taken on the same days in both designs, so potential IOV would affect both designs equally, unlike for TK where IOV would impact the compact design, but not the traditional design. Furthermore, as doses were given daily in both designs, not split into courses of treatment, occasions would be difficult to define for the neutrophil count. Previous studies have shown that IOV in neutrophil counts is low in comparison to IIV (Hansson et al., 2010), minimizing bias (Karlsson and Sheiner, 1993).
The 6 combinations of variation were each simulated 100 times, incorporating IOV in the model using the method outlined in Karlsson and Sheiner (1993). The original model was then fitted to each of the simulated data sets, assuming no IOV was present. In order to investigate whether the presence of IOV could bias parameter estimates, the estimated parameters for each simulated data set were summarized for each combination of values. To assess whether accurately estimating IOV was possible and whether it could improve parameter estimation, the analysis was repeated with IOV estimated, which could then be compared to the results when IOV was ignored.

Neutrophil Model
A 1-compartment TK model was found to be sufficient to describe the TK profiles. Absorption was described as first-order using absorption rate constant k A . Concentration-time profiles showed clear differences between males and females, and sex was found to significantly improve the model when included as a covariate on the volume parameter. Males were used as the reference, and had an average volume of distribution of around 12 l/kg, whereas females were estimated to have a smaller volume of 7 l/kg. The estimated parameter values for the TK model are shown in Table 1.
The Friberg model was fitted to all data, using a linear linking model (equation 6) which was found to give an adequate fit to the data. Different numbers of transit compartments were investigated and 3 was found to be optimal as in other previous uses of this model (Friberg et al., 2002. The inclusion of sex in the TK model proved sufficient to explain the observed differences, so it was not included as a covariate in the neutrophil model. IIV was found to significantly improve the model when estimated on Circ 0 , MTT and Slope, the same 3 parameters as in the original paper.  (20) -a RSE is the relative standard error calculated from the covariance matrix; bootstrapped relative standard errors are given in parentheses.
The neutrophil system parameter estimates broadly agreed with another published rat study . The previous study showed a similar MTT, 53 h, to that found here, 55 h. Gamma was previously estimated as 0.15 compared to 0.67 here. The Circ 0 parameter could not be directly compared, as white blood cell counts were modelled in the earlier study, rather than the ANC used here. Model diagnostics were carried out, and the model was found to fit reasonably well, but did overestimate the effects at very low doses. Good agreement was found between bootstrapped standard errors and those output by NONMEM for most parameters.
Examples of the model fit are shown in Figure 2 for two of the dose groups with the most data. When a single larger dose is given (left plot) the neutrophil count quickly drops, reaching the nadir (minimum) around day 4. After day 4, the neutrophil count rebounds, overshoots the baseline and then returns to baseline around day 13. Following the 14 lower multiple doses beginning on day zero (right plot) the nadir is slightly lower, and is not reached until day 9. The count then remains low until around day 16 when follow up ends. The latest sampling time point measured in any animal was 21 days, which does not allow the rebound following 14 daily doses to be fully observed. The 95% prediction intervals have been calculated by simulating 1000 data sets from the model and taking the 2.5th and 97.5th percentiles.

Performance of Compact Design
The median parameter estimates from each design are comparable (Table 2), despite the difference in sample size (48 in satellite design compared to 24 in the compact design). In general IIV is well recovered, except in the case of the slope parameter from the linking model, which is overestimated by both designs. Residual error in the neutrophil model is inflated in both designs.
Box plots of the relative estimation errors of the TK parameter estimates for each design (Figure 3) illustrate how the median parameter estimates and variation around them compare to the theoretical value used in the simulation. Collecting the TK data over multiple days (compact design) does not appear to affect the ability to estimate the parameters of the TK model. The shrinkage values for the TK model in the compact design are small, with median of 1% on clearance and 9% on volume, so minimal bias is expected when using individual TK parameter estimates in the neutrophil model (Savic and Karlsson, 2009).
Similar plots for the parameters of the neutrophil model (Figure 4) also show little difference in the parameter estimates for either design, with Circ 0 and gamma underestimated by both designs. The variation in parameter estimates appears to be slightly higher in the compact design, which may be a result of the reduction in sample size.
The parameter estimates were close to the theoretical values for both designs; having individual TK data in the compact design did not appear to improve the parameter estimates of the PD model. This could be due to the slow reaction of the model to changing drug concentration. The drug effect is on the rate of change of the neutrophil count, so the resulting changes in neutrophils are therefore dependent on the history of drug exposure over time. This means that misprediction of TK at a particular time is not important. If the predicted drug concentration is correct "on average" over time then the predicted PD will be correct. An alternative explanation is the benefit of using  individual TK in the neutrophil model could be being offset by the reduction in sample size.

Impact of Interoccasion Variability
When IOV was ignored during analysis, the clearance, volume, and sex effect parameters were well estimated (Table 3), however the estimate of k A became biased, decreasing as the IOV increased. The k A parameter may have been more susceptible to bias as it is the most difficult to estimate, due to little information in the data, however it is expected to have little effect on the neutrophil model. Both IIV and residual error showed inflation when IOV was ignored during analysis, by up to 1.5fold for IIV and over 3-fold for residual error. When IOV was included in the analysis, the majority of parameters remained well estimated (Table 4), although the k A became inflated when IOV is equal to or higher than IIV, an observation which has been previously reported (Karlsson and Sheiner, 1993). The results show that IOV was well estimated using this design and also improved the estimation of IIV and residual error. An example is shown in Figure 5, where the realtive estimation errors of the IIV on the volume parameter become increasingly inflated as IOV increases when IOV is ignored during analysis. However, this trend is removed when it is estimated. An increase in the variabilty in parameter estimates can be seen as IOV increases regardless of the type of analysis.

DISCUSSION
In these simulation results, a compact design is more efficient than the traditional satellite design, greatly reducing the number of animals required without increasing the number of sampling times, while still achieving the same results. This illustrates 1 way in which NLME modelling can be used, instead of population level estimates, to improve study design, and reduce the number of animals. The compact design still performs well in the presence of IOV, as long as it is accounted for during the analysis, which removes the potential bias in parameter estimates and inflation in IIV and residual error.
Designs similar to the compact design have previously been successful when trialed with small numbers of animals (Nedelman et al., 1993;Viberg et al., 2012). This simulation lends weight to these findings, by testing the design for larger groups of animals and more complex trial designs, with more dose groups and more sampling times, highlighting the many practical advantages and flexibility of these designs.   FIG. 5. Box plots of estimates of relative estimation errors of IIV on volume parameter to compare interoccasion variability being estimated during analysis (white) and being ignored during analysis (shaded) for increasing levels of interoccasion variability simulated.
While the simulation study has only been carried out using 1 model, raising possible questions about generalizability, its success does suggest it could be an alternative design for preclinical studies in other areas. The design could be applied to any study measuring a longer-term PD effect, by selecting the days of samples based on the requirements of the PD and selecting the timings on those days based on the requirements of the TK. The design could be further improved by using optimal design, to select more informative sampling times, possibly allowing for a further reduction in sample size without loss of information.
Utilizing this new compact study design in a preclinical setting would provide numerous advantages over the satellite designs frequently used. The new compact design would allow fewer animals to be used, without additional sampling burden and without impacting on the quality of the data or the breadth of analysis that could be carried out. This makes the compact design substantially more ethical, cost-effective, and quicker to complete. In order to confirm the potential benefits of this design it should be further tested in a preclinical setting. Due to the nature of the design, it could be tested alongside a design with satellite groups, with no additional sampling times being required, allowing for a direct comparison.

FUNDING
This work was supported by a Biotechnology and Biological Sciences Research Council (BBSRC) industrial Collaborative Awards in Science and Engineering (CASE) studentship award.