Antipsychotic Behavioral Phenotypes in the Mouse Collaborative Cross Recombinant Inbred Inter-Crosses (RIX)

Schizophrenia is an idiopathic disorder that affects approximately 1% of the human population, and presents with persistent delusions, hallucinations, and disorganized behaviors. Antipsychotics are the standard pharmacological treatment for schizophrenia, but are frequently discontinued by patients due to inefficacy and/or side effects. Chronic treatment with the typical antipsychotic haloperidol causes tardive dyskinesia (TD), which manifests as involuntary and often irreversible orofacial movements in around 30% of patients. Mice treated with haloperidol develop many of the features of TD, including jaw tremors, tongue protrusions, and vacuous chewing movements (VCMs). In this study, we used genetically diverse Collaborative Cross (CC) recombinant inbred inter-cross (RIX) mice to elucidate the genetic basis of antipsychotic-induced adverse drug reactions (ADRs). We performed a battery of behavioral tests in 840 mice from 73 RIX lines (derived from 62 CC strains) treated with haloperidol or placebo in order to monitor the development of ADRs. We used linear mixed models to test for strain and treatment effects. We observed highly significant strain effects for almost all behavioral measurements investigated (P < 0.001). Further, we observed strong strain-by-treatment interactions for most phenotypes, particularly for changes in distance traveled, vertical activity, and extrapyramidal symptoms (EPS). Estimates of overall heritability ranged from 0.21 (change in body weight) to 0.4 (VCMs and change in distance traveled) while the portion attributable to the interactions of treatment and strain ranged from 0.01 (for change in body weight) to 0.15 (for change in EPS). Interestingly, close to 30% of RIX mice exhibited VCMs, a sensitivity to haloperidol exposure, approximately similar to the rate of TD in humans chronically exposed to haloperidol. Understanding the genetic basis for the susceptibility to antipsychotic ADRs may be possible in mouse, and extrapolation to humans could lead to safer therapeutic approaches for schizophrenia.


INTRODUCTION
Schizophrenia is a chronic, severe, and disabling brain disorder that affects about 1% of the population worldwide and is associated with substantial morbidity, mortality, and personal and societal costs (Saha et al. 2007;World Health Organization 2008;Laursen et al. 2012;Cloutier et al. 2016). Onset typically occurs in adolescence or early adulthood with the emergence of a number of symptoms, including hallucinations, delusions, and disorganized behavior (American Psychiatric Association. 2013). Antipsychotic medications are the mainstay of treatment for schizophrenia, but as much as 75% of patients discontinue assigned treatments due to side effects and/or inefficacy over relatively short periods of time (Lieberman et al. 2005). Deciphering the pharmacogenetics of antipsychotics could eventually allow the development of predictive algorithms, such that physicians could predict which patients are prone to develop side effects or less likely to achieve a therapeutic response. Advances in this arena could lead to safer and more efficacious therapeutic interventions for the treatment of schizophrenia.
Haloperidol is a high-potency, typical, first-generation antipsychotic whose use is associated with motoric adverse drug reactions (ADRs), including tardive dyskinesia (Soares-Weiser and Fernandez 2007), involuntary orofacial movements, and extrapyramidal symptoms (i.e., dystonia and Parkinsonism) (Miller et al. 2008). There is substantial inter-individual variation in liability to these ADRs and direct and indirect evidence suggest a role for genetic variation (Lerer et al. 2005;Patsopoulos et al. 2005;Bakker et al. 2006). There is significant heterogeneity in therapeutic response to antipsychotics (Lieberman et al. 2005), with roughly equal proportions of patients experiencing remission, partial response, and no benefit. Vacuous chewing movements (VCMs), which present in rodents as purposeless mouth openings in the vertical plane, are a valid rodent model for the human pharmacogenetic phenotype of tardive dyskinesia (Tomiyama et al. 2001;Turrone et al. 2002;Crowley et al. 2012b;Crowley et al. 2014). Indeed, using 27 genetically diverse inbred mouse strains, we previously showed that chronic haloperidol treatment (60 days) can bring on VCMs that persist for 1.5 years, that VCMs and extrapyramidal side effects are highly heritable ($0.9), and that there are strong strain effects (Crowley et al. 2012a). Genome-wide association mapping of these same 27 inbred strains nominated $50 genes for association with haloperidol-induced ADRs (Crowley et al. 2012b). Furthermore, using RNA sequencing of striatal tissue from mice chronically treated with haloperidol, we previously observed an overlap between the genetic variation underlying the pathophysiology of schizophrenia and the molecular effects of haloperidol (Kim et al. 2018).
Traditional recombinant inbred strains that are generated by crossing two parental inbred strains followed by 20 generations of inbreeding by brother-sister mating, lack genetic diversity, and are limited by the number of available strains (e.g., C57BL6/J, 129S1, BALB/c, etc.). In contrast, the Collaborative Cross (CC) (Threadgill et al. 2002;Churchill et al. 2004; Collaborative Cross Consortium 2012) is a panel of recombinant inbred (RI) mouse strains, derived from eight founder laboratory strains, that was designed as an optimized murine model of heterogeneous human populations (Churchill et al. 2004;Aylor et al. 2011;Threadgill et al. 2011;Collaborative Cross Consortium 2012). The CC captures the complexity of the mammalian genome, and is an important resource for mapping complex traits and system genetics efforts (Aylor et al. 2011;Kelada et al. 2012;Ferris et al. 2013;Gralinski et al. 2015;Gralinski et al. 2017;Shorter et al. 2017;Srivastava et al. 2017). The CC has also been a source of new models of human diseases (Rogala et al. 2014;Graham et al. 2016;Mcmullan et al. 2018;Orgel et al. 2019). By mating different CC strains, we can generate recombinant inbred inter-crosses (CC-RIX) (Zou et al. 2005;Schoenrock et al. 2018). CC-RIX are outbred but their genomes are completely reproducible (by mating their parental CC strains) and thus are an intriguing model of the genetic diversity akin to that in human populations (Zou et al. 2005;Liu et al. 2018). The genetic diversity of CC-RIX enables the study of parent-of-origin effects and their phenotypic characterization has demonstrated that there are strong strain effects in some measures (Crowley et al. 2015;Schoenrock et al. 2018). Using standard inbred lines, we previously demonstrated that we could effectively deliver haloperidol at human-like steady-state concentrations, that steady-state plasma haloperidol concentration is a trait with relatively high heritability, and that there were substantial RI strain differences in the vulnerability to VCMs (Crowley et al. 2012a;Crowley et al. 2012b;Crowley et al. 2014).
In this study, we generated 840 mice and performed extensive phenotypic characterization of 777 mice representing 73 RIX lines chronically treated with haloperidol or placebo in order to gain insight into the genetic basis of ADRs. We performed a battery of behavioral tests, including open field activity, the inclined screen test, and video recordings of VCMs, in order to examine the strain and treatment effects of chronic haloperidol exposure. Using linear mixed models we showed that RIX lines exhibited strong strain effects and strain-by-treatment interactions, supporting the hypothesis of a significant differential contribution of their genetic background to haloperidol-induced ADRs. The fact that RIX lines displayed a range of responses to haloperidol-induced ADRs, with a distribution that is comparable to the susceptibility of haloperidol-ADRs in the human population, supports their use as a model for the study of pharmacogenomics and additional phenotypes.

Mice
A total of 840 mice were tested, including 423 females (210 haloperidol, 213 placebo) and 417 males (210 haloperidol, 207 placebo). These mice represented a total of 73 RIX lines which were derived from 62 CC strains and tested across 51 batches. See Table 1 for summary information on mice with complete data for each phenotype. RIX mice used in this study were born between 5/7/2012 and 6/ 16/2014. All mice were bred from CC strains from the Systems Genetics Core Facility at the University of North Carolina (http:// csbio.unc.edu/CCstatus/index.py?run=availableLines). CC mice were crossed to generate RIX mice as shown in Figure 1 and Figure S1. Pups were weaned at 3 weeks of age and housed two animals per cage, with one randomly assigned to receive haloperidol and the other placebo. Animals were maintained on a 14 hr light/10 hr dark schedule with water and food available ad libitum. All testing procedures were conducted in strict compliance with the Guide for the Care and Use of Laboratory Animals (Institute of Laboratory Animal Resources, National Research Council 1996) and approved by the Institutional Animal Care and Use Committee of the University of North Carolina.
The study design of our 6-week haloperidol phenotyping protocol is summarized in Figure 1. Most RIX lines (63 of 73) included 12 animals, although some lines had fewer (two were composed of only two animals each) and others more (one line included 18 mice). Given the number of mice assayed in this study and the battery of tests we employed, it was unfeasible to generate and test all mice in a single batch. Our mixed models include batch as a random effect to account for this possible confounder. Sarasota, FL, USA) (Fleischmann et al. 2002) or placebo and treated for 30 days for a chronic haloperidol administration paradigm. All experimental procedures were randomized to minimize batch artifacts (Leek et al. 2010) (e.g., assignment to haloperidol or placebo; cage; order of dissection; and assay batch). Experimenters were blind to treatment status.
We have previously demonstrated that this procedure reliably yields human-like steady-state concentrations of haloperidol (typically 3.75-19 ng/ml) (Hsin-Tung and Simpson 2000) in blood plasma and brain tissue. This chronic dosing paradigm reliably results in VCMs (an established model of extrapyramidal symptoms) (Turrone et al. 2002) in multiple mouse strains (Crowley et al. 2012b;Crowley et al. 2012a;Crowley et al. 2014). Haloperidol pellets were implanted subcutaneously with a trocar under 2 min of isoflurane anesthesia to minimize handling stress and pain. Two pellets of incremental dosages were implanted 2 days apart to compensate for varying body weights and to minimize acute sedation (Crowley et al. 2012a). Placebo-treated animals were implanted with pellets containing the same matrix material but no drug. Following pellet implantation, we positioned food and water near the surface of the bedding, since movement can be compromised in a strain-specific manner due to variable degrees of haloperidol-induced extrapyramidal side effects.
Open field activity (OFA) Extrapyramidal side effects may appear as general motor deficits in mice. Open field activity is a behavioral assay that measures general locomotor activity, anxiety, and exploratory drive (Crusio 2013). OFA was measured on days -7 and +28 relative to the start of drug treatment (day 0). Spontaneous locomotor activity in the open field (Crawley 1985) was measured for 30 min using a photocell-equipped automated open field apparatus with a white Plexiglas floor and clear Plexiglas walls (Superflex system, Accuscan Instruments, Columbus, OH; arena measures 40 cm wide · 40 cm long · 30 cm high) surrounded by infrared detection beams on the X, Y and Z-axes that track the animals' position and activity over the course of the experiment. The apparatus is isolated within a 73.5 · 59 · 59 cm testing chamber fitted with overhead fluorescent lighting (lux level 14). One to two hours before testing, cages were moved from the housing room to the testing room. Following acclimation, animals were removed from their home cage, immediately placed in the corner of the open field arena, and allowed to freely explore the n■   Figure S1) in order to generate RIX lines with maximum genetic diversity. We tested up to 12 animals per RIX line: 6 males (3 haloperidol, 3 placebo) and 6 females (3 haloperidol, 3 placebo).
(b) Mice were housed two per cage from weaning with one designated for drug treatment and the other for placebo treatment. To avoid batch effects, each RIX line was tested across up to three batches. (c) RIX mice were aged for $8 weeks before being added to the phenotyping pipeline. Phenotyping occurred "pre-drug" (7 days before pellet implantation), after "acute drug" treatment (on day 4; one day after split pellet implantation), and chronic treatment with haloperidol or placebo ("chronic drug": 28-35 days after pellet implantation).
Open field activity (OFA) was done both pre-drug and after chronic drug treatment. The inclined screen test, which measures extrapyramidal side effects (EPS) was done both pre-drug and after acute drug treatment. Vacuous chewing movements (VCMs) were measured after chronic drug treatment.
apparatus for a 30-minute test interval. Four phenotypes were extracted from these activity data: total distance traveled (cm), vertical activity (number of beam breaks on the Y-axis), stereotypy (a measure of repetitive movements based on repeated breaking of the same beam), and time spent in the central region of the chamber (a measure of anxiety). Activity chambers were located inside of soundattenuating boxes equipped with houselights and fans.

Extrapyramidal side effects (EPS)
The inclined screen test (Barnes et al. 1990) was used as an index of Parkinsonian rigidity and sedation. Mice were placed on a wire mesh screen inclined at 45°and the latency to move all four paws was recorded (to a maximum of 60 sec). EPS was measured at baseline (day -5) and 24 hr after implantation of the second drug pellet (day 4), since pilot work indicated that haloperidol-induced EPS was most pronounced after acute, rather than chronic, drug treatment.
Orofacial movement scoring Video recording of vacuous chewing movements (VCMs) was carried out after 28 days post-treatment. To this end, mice were briefly anesthetized with isoflurane and restrained for 25 min using a plastic collar. Collars were made from two plastic semicircular pieces that were adjusted based on neck size and to achieve the most comfortable position for the mouse. The collar partially immobilized the mice at the neck but still permitted head movement to allow for video recording of jaw movements by JVC Everio digital camcorders. Digital videotapes were made using the protocol developed by Tomiyama et al. (Tomiyama et al. 2001). The first 10 min of video were not analyzed in order to allow the mice to adjust to the collar and to relax. The last 15 min of the video were scored for orofacial movement. Videos were randomized and scored by a single-blinded rater to increase consistency and to reduce any deviation or bias between raters. The rater was trained by an expert and a set of standard training videos used in the studies by Crowley et al. (Crowley et al. 2012a;Crowley et al. 2012b) to align the rater with the correct identification of VCMs according to the scoring from those previous studies. Drift was monitored by re-scoring random videos throughout the course of the study. The movements that were specifically analyzed and counted were tongue protrusions, jaw tremors, overt chewing movements, and subtle chewing movements. Individual events of each movement were counted. Subtle chewing movements were defined as instances of vertical jaw movement in which the inside cavity of the mouth could not be seen and the jaw was not open for a long period of time. Overt chewing movements occurred when a larger vertical movement was observed in which the cavity could be seen and the jaw was open for an extended length of time. The videos were scored using The Observer XT (Noldus Inc., Wageningen, Netherlands) observational data analysis program.

Tissue collection
Mice were killed on day +32 relative to the start of drug treatment by cervical dislocation. We collected blood plasma from haloperidol-treated mice at this time using EDTA-treated tubes in order to measure haloperidol drug concentration. Blood was centrifuged to isolate plasma. Haloperidol assays were performed using mass spectrometry at the Analytical Psychopharmacology Laboratory located at the Nathan Kline Institute for Psychiatric Research (Orangeburg, NY, USA).

Statistical models
As shown in Table 1, the phenotypes studied can be grouped into three classes depending on the nature of outcome ascertainment; correspondingly, three separate statistical models were employed. For the open field activity, EPS, and body mass phenotypes, outcomes were ascertained on two occasions (pre-and post-treatment) in both arms. VCMs were measured on a single occasion (post-treatment) in both arms. Blood plasma phenotype was also only measured posttreatment, but naturally only applies to the haloperidol arm.
For the open field, EPS, and body weight phenotypes, the difference between pre-and post-treatment measurements were taken and log-transformed for body weight. The mixed model employed for these phenotypes was: where Y refers to the vector of transformed responses, x pre to the vector of pre-treatment phenotype values, x tmt is a vector of treatment indicators, x sex controls for sex, and x sexÃtmt controls for the interaction of sex and treatment. Each g j refers to a random effect vector associated with the jth random effect. The g j 's correspond to the batch, strain, strain-by-treatment, strain-by-sex, and strain-bysex-by-treatment effects (henceforth referred to as effects 1-5, respectively). and have associated random design matrices Z 1 2 Z 5 . We assume that the g j 's are mutually independent and g j $ Nð0; s 2 j I qj·qj Þ. Further, the error term e $ Nð0; s 2 e Ã I n·n Þ, independent of the random effects. Details on the construction of the associated random design matrices (Z 1 2 Z 5 ) are included in File S1.
A single VCM phenotype was derived by summing the number subtle and overt VCM, tongue movements, and tremors, in order to give each movement equal weight. In previous work by Crowley et al. (Crowley et al. 2012a), when 27 inbred strains were treated with chronic haloperidol, the behavioral domains that were measured loaded onto two factors, one of which was primarily haloperidolinduced orofacial movements. While we did not observe the same separation in our data, we did perform a factor analysis and a principal component analysis on the four VCM variables to explore alternative data reduction approaches. When using the latent factor or the first principal component as the dependent variable, the inference was indistinguishable from that described below. For this phenotype, the following model was employed, with the term to control for the pre-treatment value excluded, as follows: Haloperidol blood concentration was log-transformed, and the following model was employed. This model controls for three random effects: batch, strain, and sex (1, 2, and 4): All models were fit in SAS/STAT 14.1 software, Version 9.4 of the SAS System for Linux (Sas Institute Inc. 2015), using PROC MIXED. This procedure allows the user to specify flexible design matrices, but the required syntax to achieve the desired correlation structure is nontrivial, and sample code is included in File S1.

Data availability
All data from this study have been deposited in the Mouse Phenome Database (https://phenome.jax.org/projects/Xenakis1). Available CC strains can be obtained from the Systems Genetics Core Facility at the University of North Carolina (http://csbio.unc.edu/CCstatus/ index.py?run=availableLines). Table S1 describes the CC strains used to generate the RIX lines that were part of the study. Figure S1 shows the mating scheme used to cross CC strains and generate RIX with maximum genetic diversity. Figure S2 shows haloperidol-induced changes in behavioral measures. Figure S3 shows the raw data for RIX mice treated with haloperidol or placebo, for the centroid time and stereotypy measures of OFA. Figure S4 shows RI strain-by-treatment and strain prediction intervals for stereotypy, centroid, and EPS. Figure S5 presents raw data and strain-by-treatment predictions for body weight of RIX mice treated with haloperidol or placebo. Figure  S6 shows CC strain-level predictions for plasma haloperidol levels. Figure S7 presents the distributions of transformed phenotypes used in our genetic mapping effort. Figure S8 presents the results of our genetic mapping effort. Figure S9 presents the results of our small simulation study to illustrate a lack of power to detect a significant locus in the presence of a strong polygenic effect. Figure S10 compares the pre-treatment distributions of various behavioral measures between the CC-RIX, the CC diallel study (Crowley et al. 2014), and a survey of 27 mouse strains (Crowley et al. 2012a;Crowley et al. 2012b). File S1 includes sample code for linear mixed models. File S2 includes the power simulation of QTL mapping. Supplemental material available at figshare: https://doi.org/10.25387/g3.9757208.

Generation of RIX lines from CC strains
In this study, we used RIX mice derived from the genetically diverse CC in order to understand the effects of haloperidol-induced ADRs, using a battery of behavioral tests as a means of doing a comprehensive phenotypical characterization. To maximize genetic diversity and test outbred mice with reproducible genomes, we generated RIX mice by crossing males and females from two different CC RI strains ( Figure 1a) in a quasi-loop design ( Figure S1). This quasi-loop design used all RI genomes available at the time, generating a mostly uniformly structured population, potentially expanding the analytical possibilities beyond simple additive models, and providing a systematic approach to detect regulatory variation including parent-oforigin effects. We performed a battery of behavioral tests in 840 mice from 73 RIX lines (derived from 62 CC strains) treated with haloperidol or placebo in order to monitor the development of ADRs. Figure 1c details the phenotyping pipeline for this study. Mice were entered into the phenotyping pipeline at 8 weeks of age and behavioral assays were carried out as detailed in Materials and Methods. Below, we provide both descriptive as well as model-based results for the behavioral phenotypes examined using the CC-RIX.

Descriptive data
CC-RIX mice exhibit reduced open field activity after chronic treatment with haloperidol: The raw data are visualized at the marginal (that is, not strain-specific) level in a series of density plots (Figures 2a-d) that illustrate the densities for each phenotype. Open field activity decreased from pre-to post-treatment for mice treated with placebo or haloperidol (Figure 2a-c). This is visualized as a leftshift in the distribution curve, for all phenotypes except for stereotypy (Figure 2d). This was expected for both cohorts, as mice are known to spend less time exploring the open field chamber when re-exposed, due to lack of novelty. For distance traveled (Figure 2a; Figure S2a) and vertical activity (Figure 2b; Figure S2b), the left-shift is far more pronounced for the haloperidol treated mice, suggesting a strong average treatment effect. The left-shift is more modest for time in centroid (Figure 2c; Figure S2c), and not observed at all for stereotypy (Figure 2d; Figure S2d). Note that this implies only that there is a lack of a marginal effect. For stereotypy, although the raw data do not suggest a treatment effect at the mean, treatment still affects individual RIX strains differently (see below).
Acute haloperidol exposure results in increased time on inclined screen test: The raw data from the inclined screen test (Figure 2e), suggest an extreme effect of haloperidol. For pre-treatment and placebo-treated mice the series cluster around zero; these mice move almost immediately after being placed on the inclined screen. Conversely, the distribution for the haloperidol treated mice is bimodal, exhibiting an extremely fat right tail, enriched with a noticeable proportion of animals who are unable to move their four paws at all within the 60-second experiment.
Chronic haloperidol treatment increases incidence of VCMs: The raw data for VCM are visualized in the distribution curves in Figure  2f. Haloperidol-treated RIX mice show increased susceptibility to the emergence of VCMs, as evidenced by the fattening of the right tail in the haloperidol series vis-à-vis the placebo series. Note a small but noticeable proportion of animals from both treatment paradigms do not experience any VCMs.

Strain-level descriptive figures: Figures 2a-f and Figures S2a-d pro-
vide an overview at the population-average level. Figure 3, Figure S3, and Figure S5b show the raw data at the RIX line level. For each line the distribution of the phenotype (or change in phenotype, where appropriate), is visualized in boxplots where the first and third quartiles are visualized at the bottom and top of the boxes, respectively, and the median as a black hash mark. The pink boxes correspond to the placebo data and the blue to haloperidol. The strains are organized by increasing difference between these strainlevel medians. These differences in medians are visualized as the monotonically increasing series of red dots. Chronic haloperidol exposure had divergent effects across OFA measures. For distance traveled and vertical activity, more than half of the RIX lines display a reduction in these OFA measures after haloperidol exposure ( Figure  3a-b), while less than a quarter of RIX lines displayed changes in stereotypy ( Figure S3a), and centroid duration ( Figure S3b) seemed unaffected in most lines. The raw data for EPS showed that the majority of RIX lines showed a delayed response in the inclined screen test after haloperidol treatment (Figure 3c). Lastly, the raw data for the VCM measure showed that more than half of the RIX lines displayed higher incidence of VCMs after haloperidol treatment (Figure 3d).

Model-fitting results
Hypothesis Tests: The main model results are in Table 2, which includes results for all eight phenotypes assayed in our experiment. For ease of readability, some p-values were also not explicitly listed, but rather the level at which they were significant is simply indicated in footnotes. The first four rows of the table contain the fixed effect estimates with p-values in parentheses. These p-values were computed using the Kenward-Roger approximation to compute the denominator degrees of freedom. The next four rows contain the estimates for the variance components of interest (the batch effect has been excluded), with the p-values computed from the likelihood ratio tests (LRTs). The next four rows correspond to test statistics and p-values that involve multiple parameters; the numbers in the rows corresponding to fixed effects are F-statistics, and those in the rows corresponding to tests of random effects are likelihood ratio statistics.
One salient result is the strong marginal (fixed) effect of treatment, which is highly significant for most phenotypes. Confirming the results from the exploratory analyses above, it is not significant in the centroid and stereotypy models, however, and only weakly significant in the model for body weight. Note, however, that the strain effect is highly significant for all models, including centroid and stereotypy. The random treatment effects are also significant for all phenotypes except body weight ( Figure S5). Table 2 provide evidence of the existence of strain and strain-by-treatment interactions. However, determining which parental (RI) strains contribute most is also of interest; i.e., we are interested in the levels of the random effects not just the estimates of the variance components. An explanation of the approach we employed to illustrate these parental strain contributions benefits from some notational simplification. In general, the models we fit can be written in the most concise matrix form as follows:

Best linear unbiased predictions (BLUPS) to visualize CC strain level predictions: The main model results included in
where matrices are visualized in bold. Y corresponds to the phenotype vector of interest, X is a matrix that accommodates all the fixed effect covariates with associated parameter vector b. In this formulation, Z and g are concatenations of the random design matrices and random effects vectors. All distributional assumptions from above remain. The variance of Y, denoted V, can therefore be written as follows: where G is the variance of g. When the estimate of G is full rank (i.e., when none of the model's variance components are estimated at zero), the predictions of the random effects vector are: These predictions are commonly referred to as the Best Linear Unbiased Predictors (BLUPs) of g. We extracted the relevant portion of this BLUP vector (recall g contains all the random effects in the model) and plotted these strain level predictions in ascending order for all phenotypes (Figure 4, Figure S4, Figure S5c-d, and Figure S6). The absolute height of each bar corresponds to its 95% prediction interval. Due to the small number of samples per RIX, there is little power to differentiate between the RI effects: in comparing two strain predictions, their intervals tend to overlap, except when comparing strains at the extremes. Under the assumption of additive strain effects, these strain-level predictions can be used to predict which RIX lines will exhibit extreme phenotypes.

Strong correlations across phenotypes and behavioral measures:
We can leverage information across phenotypes to find strains that have consistently extreme predictions, and use this information to actually inform future experiments by selecting strains that have the most extreme results. Here, we show that the strain predictions are correlated across phenotypes, as are strain-by-treatment interactions.
To assess this correlation, we used the Pearson correlation coefficient and present the results in the correlograms in Figure 5. The first panel applies to the strain predictions and the second to the strain-bytreatment interactions (note that the blood haloperidol concentration appears only in this second panel; although the relevant random effects were termed "strain" effects in the model for this phenotype, they were only predicted in haloperidol-treated mice). For the additive strain effects, there are two salient results. The distance, vertical and body weight phenotypes are correlated, with changes in horizontal distance correlating highly with changes in vertical beams broken. Further, those strains with the largest decreases in the phenotypes exhibited the smallest increases in weight, as expected.
The other two open field phenotypes exhibit some correlation with VCM. Increased time spent in centroid is associated with fewer VCM, while increased stereotypy is associated with increased VCM. The correlation manifests in the strain-by-treatment effect as well, with the effect of treatment on changes in distance, vertical, and centroid QTL mapping of behavioral traits: We performed genetic mapping for each of the eight phenotypes using R/qtl2 (Broman et al. 2019).
For the six phenotypes with pre-and post-treatment measurements we computed the difference in pre-and post-treatment phenotypes as above, and then took the difference between cagemate values (subtracting the value for the haloperidol treated mouse from the value for the placebo-treated mouse). We controlled for sex and the average pre-treatment value (the average of the two cagemate values) as fixed effects. For the VCM phenotype, we used the difference in total VCM between cagemates as the phenotype and controlled for sex as a fixed effect. For plasma concentration, no differencing was necessary since, by definition, this phenotype only applies to mice treated with haloperidol; we controlled for sex as a fixed effect. All models included a random effect that accounted for the genetic similarity between the RIX samples. The model was fit at every marker on the MegaMUGA genotyping platform (Morgan et al. 2015) in an attempt to map the genetic effect of the eight CC founder alleles. The distributions of the (transformed) phenotypes are shown in Figure  S7 and the mapping results in Figure S8. The significance thresholds visualized were derived based on 100 permutations; we identified no compelling LOD peaks ( Figure S9). The inability to find significant associations is not unexpected, as these traits appear to be polygenic and we would likely need a much larger sample size to achieve enough power to map these. A small simulation study is included in File S2 to illustrate this point.

Consistent distribution with founder and CC strains:
We also sought to contextualize our findings to both the founder and CC strains. We have previously explored the genetic architecture of ADR-associated behavioral measures using a diallel cross of the eight genetically diverse founder strains of the CC, and shown that $70% of the variance in EPS is explained by parent-of-origin and additive effects (Crowley et al. 2014). We compared the pre-treatment distributions of the phenotypes for which comparable data were available in our data, the diallel cross, and the aforementioned survey of 27 mouse strains (Crowley et al. 2012a;Crowley et al. 2012b;Crowley et al. 2014). As shown in Figure S10, the distributions largely accord with our expectations; for the distance and vertical phenotypes, the right tails of the RIX distributions are thicker than those for the other samples, consistent with their expected hybrid vigor. The same pattern is observed for VCM. Conversely, the mice in the Strain Survey dataset are the heaviest, as they are enriched with NZO samples, which were specifically bred to express this phenotype.

DISCUSSION
Our study employed the genetically diverse CC-RIX mice to elucidate the genetic basis of antipsychotic ADRs. We performed a battery of behavioral tests in 840 mice from 73 RIX lines treated with haloperidol or placebo in order to monitor the development of ADRs. CC-RIX lines displayed a wide-ranging response to haloperidol treatment induced ADRs as captured in pre-and post-drug behavioral measures. On average, treatment had a strong effect for every phenotype, with the exceptions of centroid and stereotypy. However, note that even for these phenotypes, there was significant betweenstrain variability. The majority of the CC-RIX lines exhibited reduced distance traveled and vertical activity upon haloperidol treatment, with about a third of the RIX lines exhibiting little change in these measures after exposure to haloperidol (Figure 3a-b). Centroid time and stereotypy measures did not appear to be as sensitive to haloperidol induced response ( Figure S3). The majority of RIX lines showed a delayed response in the inclined screen test (EPS) after haloperidol treatment ( Figure 3c) and more than half of the RIX lines displayed higher incidence of VCMs after haloperidol treatment ( Figure 3d). Interestingly, the prevalence of ADRs in the CC-RIX population is analogous to what has been observed in humans, where 20-43% of patients treated with antipsychotics can develop tardive dyskinesia (Halliday et al. 2002;Soares-Weiser and Fernandez 2007) We used linear mixed models to test for the existence of strain and treatment effects. Of note, all phenotype models that included a pretreatment fixed effect demonstrated significant regression to the mean; subjects with higher pre-treatment scores experienced larger reductions in the phenotype. We observed highly significant strain effects for almost all behavioral measurements investigated (P , 0.001). Further, we observed strong strain-by-treatment interactions for most phenotypes, with particularly strong effects (P , 0.001) for change in distance traveled and change in vertical activity, as well as change in EPS. When viewing the BLUPs, it is important to keep in mind that given the small sample sizes for each parental strain, comparisons between the effects of individual pairs of parental strains are not necessarily significant, even when the overall effects are highly significant. To draw closer attention to the treatment effect BLUPs, we indicate strains that are at the extreme for those phenotypes with a substantial mean shift in Figure 3 and Figure S3 (in the direction consistent with this mean shift) in the rightmost columns of Table S1. We flag strains that are in the most extreme 10%, finding that 27 strains had extreme responses for at least one of these phenotypes, with eight of these being extreme for multiple phenotypes.
Despite extensive efforts, we have a limited understanding of the heritability of motoric ADRs in humans (Zhang and Malhotra 2011;Drago et al. 2012;Crowley et al. 2014). As shown in Table 2, estimates of heritability for the phenotypes measured using the CC-RIX ranged from 0.21 (for change in body weight) to 0.4 (for both number of VCMs and change in distance traveled). Using a similar chronic n■ haloperidol treatment paradigm in 27 genetically diverse inbred strains, we previously showed that haloperidol-induced ADRs (including VCMs and EPS) are highly heritable ($0.9), with strain being a major predictor of phenotypic variation independently of haloperidol plasma levels (Crowley et al. 2012a). The discrepancy between the heritability estimates found for VCMs between these studies is likely due to the genetic diversity present in the CC-RIX, when compared to the 27 inbred lines, and the lower heritability found in the CC-RIX (h 2 $ 0.4), which is likely closer to the heritability of the analogous trait (TD) in humans.
One major goal of our study was to identify specific genes driving antipsychotic ADRs in the CC-RIX mice. Extensive work has been done by us and others to map ADRs, including haloperidol-induced catalepsy, using other mouse strains and crosses (Kanes et al. 1996;Patel and Hitzemann 1999;Rasmussen et al. 1999;Hofstetter et al. 2008;Crowley et al. 2012b). For example, work by Kanes et al. who used the BXD recombinant inbred series (C57Bl/6J and DBA/2J) identified a QTL on chromosome 9 near Drd2 (Kanes et al. 1996). In our case, we performed genetic mapping for each of the eight phenotypes using R/qtl2 ( Figures S7-10) (Broman et al. 2019). Because our goal was to map the underlying genetic effect of treatment response, slightly different models than those described above were employed. The inability to find significant associations is not unexpected, as these traits appear to be polygenic and we would likely need a much larger sample size to achieve enough power to map these.
Our study supports the use of the CC-RIX mouse population for pharmacogenomics studies. We observed that 30% of CC-RIX mice exhibited a sensitivity to haloperidol exposure, evidenced by VCMs and EPS, closely matching the proportion of humans that develop TD after treatment with this antipsychotic. Given the significant impact that antipsychotic ADRs and subsequent discontinuation of treatment can have on patients, making headway in our understanding of the genetic basis for the susceptibility to antipsychotic ADRs could advance the development of safer and more effective therapeutic approaches for the treatment of schizophrenia.

ACKNOWLEDGMENTS
Major funding was provided by National Institute of Mental Health/ National Human Genome Research Institute Center of Excellence for Genome Sciences grants (P50MH090338 and P50HG006582). The Collaborative Cross project is also supported by the University Cancer Research Funds granted to the Lineberger Comprehensive Cancer Center (MCR012CCRI). We thank the High-Throughput Sequencing Facility and the Systems Genetics Core at UNC for their assistance with this project. PFS reports the following financial interests. Current: Lundbeck (advisory committee, grant recipient). Past three years: Pfizer (scientific advisory board). These are not directly related to the research presented in this paper.