Mean-Variance QTL Mapping Identifies Novel QTL for Circadian Activity and Exploratory Behavior in Mice

We illustrate, through two case studies, that “mean-variance QTL mapping”—QTL mapping that models effects on the mean and the variance simultaneously—can discover QTL that traditional interval mapping cannot. Mean-variance QTL mapping is based on the double generalized linear model, which extends the standard linear model used in interval mapping by incorporating not only a set of genetic and covariate effects for mean but also set of such effects for the residual variance. Its potential for use in QTL mapping has been described previously, but it remains underutilized, with certain key advantages undemonstrated until now. In the first case study, a reduced complexity intercross of C57BL/6J and C57BL/6N mice examining circadian behavior, our reanalysis detected a mean-controlling QTL for circadian wheel running activity that interval mapping did not; mean-variance QTL mapping was more powerful than interval mapping at the QTL because it accounted for the fact that mice homozygous for the C57BL/6N allele had less residual variance than other mice. In the second case study, an intercross between C57BL/6J and C58/J mice examining anxiety-like behaviors, our reanalysis detected a variance-controlling QTL for rearing behavior; interval mapping did not identify this QTL because it does not target variance QTL. We believe that the results of these reanalyses, which in other respects largely replicated the original findings, support the use of mean-variance QTL mapping as standard practice.

to QTL mapping (Rönnegård and Valdar 2011) and the omnibus test of Cao et al. (2014) for genetic association. [See also refs in Rönnegård and Valdar (2012), and recent developments of Soave and Sun (2017); Dumitrascu et al. (2018).] Both of these approaches can detect mQTL, vQTL, and QTL influencing some combination of phenotype mean and variance (mvQTL) (Corty and Valdar 2018a). Yet despite the potential of these methods to detect QTL traditional methods overlook, they remain underutilized.
Barriers to widespread adoption include a lack of proven potential in real data applications, as well as the absence of software that is interoperable with existing infrastructure. Apart from those barriers, one reasonable concern is that a novel approach might fail to identify known QTL, adding needless complexity to the interpretation of alreadyreported QTL. This concern should be largely allayed by the nature of the DGLM as an extension of the linear model, simplifying to the latter when variance heterogeneity is absent [with similar arguments holding for the omnibus test of Cao et al. (2014)]. In fact, rather than add complexity, the DGLM automatically classifies QTL into mQTL, vQTL, or mvQTL, clarifying the genotype-phenotype relationship.
Here we demonstrate, with two real data examples available from the Mouse Phenome Database (Bogue et al. 2017), that QTL mapping using the DGLM, which we term "mean-variance QTL mapping", largely replicates the results of standard QTL mapping and detects additional QTL that the traditional analysis does not. In two companion articles, we demonstrate typical usage of R package vqtl, which implements mean-variance QTL mapping (Corty and Valdar 2018b), and describe the how mean-variance QTL mapping and its associated permutation procedures reliably detects QTL in the face of variance heterogeneity arising from background factors (i.e., genetic or non-genetic factors outside the targeted QTL) (Corty and Valdar 2018a).

STATISTICAL METHODS
Traditional QTL mapping based on the standard linear model (SLM) The traditional approach to mapping a quantitative trait in an experimental cross with no population structure (e.g., an F2 intercross or backcross) involves fitting, at each locus in turn, a linear model of the following form. Letting y i denote the phenotype value of individual i, this phenotype is modeled as where s 2 is the residual variance, and the expected phenotype mean, m i ; is predicted by effects of QTL genotype and, optionally, effects of covariates.
In the reanalyses performed here, m i is modeled to include a covariate of sex and additive and dominance effects of QTL genotype, that is, where m is the intercept, b sex is the sex effect, with sex i indicating (0 or 1) the sex of individual i, and b a and b d are the additive and dominance effects of a QTL whose genotype is represented by a i and d i defined as follows: when QTL genotype is known, a i is the count (0,1,2) of one parental allele, and d i indicates heterozygosity (0 or 1); when QTL genotype is inferred based on flanking marker data, as is done here, a i and d i are replaced by their corresponding probabilistic expectations (Haley and Knott 1992;Martínez and Curnow 1992). The evidence for association at a given putative QTL is based on a comparison of the fit of the model above with that of a null model that is identical except for the QTL effects being omitted. These models and their comparison we henceforth refer to as the standard linear model (SLM) approach.

Mean-variance QTL mapping based on the double generalized linear model (DGLM)
The statistical model underlying mean-variance QTL mapping, the double generalized linear model (DGLM ;Smyth 1989 andValdar 2011), elaborates the SLM approach by modeling a potentially unique value of s 2 for each individual, as where m i has the same meaning as in the SLM, but now s 2 i is linked to its own linear predictor v i as where the exponentiation ensures that s i is always positive, though v i is unconstrained. The linear predictors for m i and v i are modeled as mean : where m, a i ; d i , sex i ; and the b's are as before, m v is an intercept representing the (log of the) "baseline" residual variance, and g a ; g d ; and g sex are the effects of the QTL and covariates on v i : The evidence for a QTL association is now defined through three distinct model comparisons, corresponding to testing for an mQTL, a vQTL, or an mvQTL. In each case, the fit of the "full" model in Equation 1 is compared with that of a different fitted null: for the mQTL test, the null model omits the QTL effects on the mean (i.e., b a ¼ b d ¼ 0); for the vQTL test, the null model omits the QTL effects on the variance (i.e., g a ¼ g d ¼ 0); and for the mvQTL test, the null model omits QTL effects on both mean and variance (i.e., b a ¼ b d ¼ g a ¼ g d ¼ 0). These tests are detailed in Corty and Valdar (2018a).

Genomewide significance and FWER-adjusted p-values
The model comparisons described above constitute the SLM test and the three DGLM-based tests and each produces a likelihood ratio (LR) statistic. These LR statistics are converted to p-values that are adjusted for the familywise error rate (FWER) across loci, i.e., p-values on the scale of genomewide significance. This adjustment is performed separately for each test by calculating an empirical distribution for the LR statistic under permutation, much in the spirit of Churchill and Doerge (1994) but with some modifications, namely that different tests have differently structured permutations. Briefly, let G i be the full set of genetic information for individual i, that is, the genotypes or genotype probabilities across all loci. For the SLM and mvQTL tests, we define a permutation as randomly shuffling the G i 's across individuals; for the mQTL test, the permutations apply this shuffle only to the genotype information in the full model's mean component; for the vQTL test, the permutations apply the shuffle only to the genotype information in the full model's variance component. For a given test, for each permutation we calculate LR statistics across the genome and record the maximum; the maxima of over all permutations is fitted to a generalized extreme value distribution, and the upper tail probabilities of this fitted distribution are used to calculated the FWER-adjusted p-values for the LR statistics in the unpermuted data [see Dudbridge and Koeleman (2004), and, e.g., Valdar et al. 2006; more details in Corty and Valdar (2018a)]. An FWER-adjusted p-value can be interpreted straightforwardly: it is the probability of observing an association statistic this large or larger in a genome scan of a phenotype with no true associations.

Data Availability
All data and scripts used to conduct the analyses presented here and plot results are archived in a public, static repository at with DOI: 10.5281/ zenodo.1453905. Specifically, the raw data files are: • 1 Kumar2014:csv The phenotype and genotype data from Kumar et al. (2013) that was reanalyzed. This dataset is also available from the Mouse Phenome Database (Bogue et al. 2017) at https://phenome.jax.org/projects/Kumar1. • 4 Bailey2008:csv The phenotype and genotype data from Bailey et al. (2008) that was reanalyzed. This dataset is also available from the Mouse Phenome Database at https://phenome.jax.org/projects/ Bailey1. • 9 actogram data The raw data on circadian activity from Kumar et al. (2013) that was used to plot actograms The analysis and plotting scripts are: • 2_run_Kumar_scans.R This script runs genome scans with R/qtl and R/vqtl on the data from Kumar et al. (2013 Kumar et al. (2013) intercrossed C57BL/6J and C57BL/6N, two closelyrelated strains of C57BL6 that diverged in 1951, approximately 330 generations ago. Due to recent coancestry of the parental strains, this cross is termed a "reduced complexity cross", and their limited genetic differences ensure that any identified QTL region can be narrowed to a small set of variants bioinformatically. The intercross resulted in 244 F2 offspring, 113 female and 131 male, which were tested for acute locomotor response to cocaine (20mg/kg) in the open field. One to three weeks following psychostimulant response testing, the mice were tested for circadian wheel running activity.
Analysis of wheel running data were carried out using ClockLab software v6.0.36. For calculation of activity, 20 day-epoch in DD was used in order to have standard display between actograms. Analysis of other circadian measures such as period (tau) or amplitude were carried out using methods previously described (Shimomura et al. 2001). All animal protocols were approved by the Institutional Animal Care and Use Committee (IACUC) of the University of Texas Southwestern Medical Center Traditional QTL mapping with the SLM, reported in Kumar et al. (2013), detected a single large-effect QTL for cocaine-response traits on chromosome 11, but no QTL for circadian activity. A later study by another group nonetheless observed that the circadian activity of the two strains showed significant differences (Banks et al. 2015).

Reanalysis with traditional QTL mapping and meanvariance QTL mapping
For the cocaine response traits, traditional QTL mapping and meanvariance QTL mapping gave results that were nearly identical to the originally-published analysis in Kumar et al. (2013) (Figure S1).
For the circadian wheel running activity trait, however, traditional QTL mapping identified no QTL (Figure 1 in green) but mean-variance QTL mapping identified one QTL on chromosome 6 ( Figure 1 in blue, black, and red). In this case, all three tests were statistically significant, but the most significant was the mQTL test (blue), so we discuss it as an mQTL. The most significant genetic marker was rs30314218 on chromosome 6, at 18.83 cM, 40.0 Mb, with a FWER-controlling p-value of 0.0063. The mQTL explains 8.4% of total phenotype variance by the traditional definition of percent variance explained (e.g., Broman and Sen 2009). Understanding the Novel QTL Though they test for the same pattern, the mQTL test of meanvariance QTL mapping identified a QTL where the traditional QTL test did not. This discordance may arise when there is variance heterogeneity in the mapping population. In this case, mice homozygous for the C57BL/6N allele at the mQTL have both higher average wheel running activity and lower residual variance in wheel running activity than mice with other genotypes (Figure 2a).
The identification of this QTL by mean-variance QTL mapping but not traditional QTL mapping can be understood by contrasting how the DGLM and SLM fit the data at this locus. Figure 2 (a) Average wheel speed (revolutions/minute) of all mice. It is visually apparent that female mice had higher circadian wheel running activity than male mice and that mice that homozygous for C57BL/6N had higher circadian wheel running activity and less intra-genotype variation. Large dots indicate the mice whose activity is shown in actogram form (males in Figure 3; all in Figure S2). (b) Predicted mean and variance of mice according to sex and allele at the QTL. What was visually apparent in (a) is captured by the DGLM. The estimated parameters relating to mice that are homozygous for the C57BL/6N allele imply a higher expected value and a lower residual variance than the other two genotype groups. Black x's indicate the estimates from the SLM, very similar to the DGLM estimates in the horizontal (mean) axis, but homogeneous in the vertical (variance) axis.

Figure 3
Double-plotted actograms illustrate the variation in wheel running activity of male mice based on their genotype at rs30314218. On reading a single actogram: An actogram illustrates the activity of a single mouse over the course of an experiment. Each day of the experiment is represented by a histogram, with bin width of six minutes. Histograms are stacked vertically. Additionally, each day is shown twice (repeated horizontally) so that there is no time of day that is illegible due to the plot edges. Yellow box indicates when lights were on. On reading this six-actogram plot: Recall that the DGLM estimates a unique mean and standard deviation (SD) for each genotype. The mice whose actograms are shown here had an activity level that is one genotype-specific SD greater than (top) or less than (bottom) the genotype-specific mean. The difference between the two is much less in the C57BL/6N homozygotes than in the other genotypes, reflecting the decreased phenotype variance among C57BL/6N homozygotes. The animals shown in this figure are marked with large blue circles in Figure 2a. A larger figure that also includes female mice as well as the ID's of all plotted mice are in the supplement ( Figure S2 and Table 2).
For the SLM, a single value of the residual standard deviation s is estimated for all mice. Approximately 25% of the mice are homozygous for the C57BL/6N allele, so s is estimated mostly based on heterozygous mice and homozygous C57BL/6J mice. The SLM estimateŝ s ¼ 7:83, a slight underestimate for some genotype-sex combinations, and a drastic overestimate for the homozygous C57BL/6N of both sexes (Figure 2b). With s overestimated for the C57BL/6N homozygotes, the addition of a locus effect to the null model results in only a limited increase in the likelihood, one that could reasonably be caused by chance alone. For the DGLM, six different values of s are estimated, one for each genotype-sex combination (Figure 2b). With a betterestimated (lower)ŝ for the C57BL/6N homozygotes, the addition of the locus effect to the null model results in a greater increase in the likelihood, one that is very unlikely due to chance alone.
A simulation based on the estimated coefficients shows that at a false positive rate of 5 · 10 24 , relevant for genome-wide significance testing, the SLM has 61% power to reject the null at this locus and the DGLM has 90% power (See file 8_power_simulations.R).

Variant Prioritization
Reduced complexity crosses allow variant prioritization to proceed quickly because of the number of segregating variants is small. Using 1000 nonparametric bootstrap resamples, the QTL interval was estimated as 13.5-23.5 cM (90% CI), which translates to physical positions of 32.5 -48.5 Mb using Mouse Map Converter's sex averaged Cox map (Cox et al. 2009). Since this interval contains no genes or previously identified QTL shown to regulate circadian rhythms, we prioritized candidates by identifying variants between C57BL/6J and C57BL/6NJ based on Sanger mouse genome database (Keane et al. 2011;Simon et al. 2013), which yielded 463 SNPs, 124 indels, and 3 structural variants (Table 1).
Of these variants, none of the indels or structural variants were nonsynonymous. Two SNPs were predicted to lead to missense changes (T to A at position 6: 39400456 in Mkrn1, and A to A/C at 6:48486716 in Sspo). The variant in Sspo was a very low confidence call and therefore likely a false positive.
The Mkrn1 (makorin ring finger protein 1) variant is a mutation in C57BL/6J that changes a highly conserved ( Figure S3 and Figure S4) tyrosine to asparagine with rsID rs30899669. It was determined to be the best candidate variant in the QTL interval. The Mkrn1 protein is a ubiquitin E3 ligase with zinc finger domains with poorly defined function (Kim et al. 2005). It is expressed at low levels widely in the brain according to Allen Brain Atlas and EBI Expression Atlas (Lein et al. 2007;Kapushesky et al. 2009;McWilliam et al. 2013). Functional analysis will be necessary to experimentally confirm that this variant in Mkrn1 is indeed the causative mutation that led, in a dominant fashion, to the decreased expected value and increased variance of circadian wheel running activity observed in mice with at least one copy of the C57BL/6J haplotype in the QTL region in this study. Bailey et al. (2008) intercrossed C57BL/6J and C58/J mice, two strains known to be phenotypically similar for anxiety-related behaviors, as a control cross for an ethylnitrosourea mutagenesis mapping study. The intercross resulted in 362 F2 offspring, 196 females and 166 males. Six open-field behaviors were measured at approximately 60 days of age in a 43cm by 43cm by 33cm white arena for ten minutes. All phenotypes were transformed with the rank-based inverse normal transform to limit the influence of outliers. The authors reported 7 QTL spread over five of the six measured traits, but none for rearing behavior.

Summary of Original Study
Reanalysis with SLM and DGLM SLM-based QTL analysis replicated the originally-reported LOD curves. Significance thresholds to control FWER at 0.05 were estimated by 10,000 permutations, using the method described in the original publication, but found to be meaningfully higher than the originallyreported thresholds. Of the 7 originally-reported QTL, 3 exceeded the newly-estimated thresholds ( Figure S5).
The DGLM-based reanalysis was initially conducted with the rank-based inverse normal transformed phenotypes, to maximize the comparability with the original study. This reanalysis largely replicated the results of the SLM-based analysis and identified a statistically-significant vQTL for rearing behavior on chromosome 2 (Figure 4 and Figure S5). The top marker under the peak was at 38.6cM and 65.5Mb.
There are well-known and well-founded concerns that inappropriate scaling of phenotypes can produce spurious vQTL (Rönnegård and Valdar 2012;Sun et al. 2013;Shen and Ronnegard 2013). Therefore, the rearing phenotype was analyzed under a variety of additional transforms: none, log, square root, and 1 4 th power (the transformation recommended by the Box-Cox procedure). Because the trait is a "count" and a positive mean-variance correlation was observed, the trait was further analyzed with a Poisson double generalized linear model with its canonical link function (log). In all cases, the same genomic region on chromosome 2 was identified as a statistically significant vQTL (p , 0:01) ( Figure S6, Figure S7 and Figure S8). Though all transformations yielded similar results, we highlight the Box-Cox transformed analysis recommended for transformation selection in Rönnegård and Valdar (2011).
n Table 2 The characteristics of the mice plotted in Figure 3 genotype at rs30314218 sex activity in the DD (rev/min) Understanding the Novel QTL In this case, the DGLM-based analysis identified a vQTL, a pattern of variation across genotypes not targeted by traditional, SLM-based, QTL analysis. The phenotype values, when stratified by genotype at the top locus, illustrate clear variance heterogeneity (Figure 5a). The effects and their standard errors estimated by the DGLM fitted at the top locus corroborate the impression from simply viewing the data, that the locus is a vQTL but not an mQTL (Figure 5b).

DISCUSSION
We have demonstrated through two case studies that mean-variance QTL mapping based on the DGLM expands the range of QTL that can be detected, including both mQTL at loci that exhibit variance heterogeneity and vQTL. In an era where ever more complete and complex data on biological systems is becoming available, this modest elaboration of an existing approach represents a step toward the broader goal of characterizing the wide array of patterns of association between genotype, environment, and phenotype. In the reanalysis of Kumar et al., mean-variance QTL mapping identified the same QTL as traditional, SLM-based QTL mapping for cocaine response traits and one novel mQTL for a circadian behavior trait. Such an mQTL would likely have been detected by a traditional QTL analysis with a larger mapping population: Through simulation, we estimated that the additional power to detect the mQTL was equivalent to the power increase that would have come from increasing the sample size by 100 mice, from 244 to 350 (See file 8_power_simulations.R). Given the considerable effort and expense associated with conducting an experimental cross or expanding the size of the mapping population, there seems to be little to be gained by omitting a DGLM-based analysis.
In the reanalysis of Bailey et al., mean-variance QTL mapping identified a novel vQTL for an exploratory behavior. A vQTL such as this would not be detected by the traditional QTL analysis no matter how large the mapping population because the pattern is entirely undetectable by the SLM.
The identification of a vQTL raises important issues related to phenotype transformation and the interpretation of findings, but both are manageable, as we have illustrated here. The criticism that a spurious vQTL can arise as the result of an inappropriate transformation is based on the observation that when genotype means are unequal, there always exists a (potentially exotic) transformation that diminishes the extent of variance heterogeneity (Sun et al. 2013). Thus, any other transformation (including none at all) can be seen as inflationary toward variance heterogeneity. In this context, however, an "inappropriate transformation" leads not to the misclassification of a non-QTL as a QTL, but an mQTL as a vQTL.
To the extent that the goal of QTL mapping is to understand a trait's genetic architecture, this criticism is valid and should be addressed by considering a wide range of transformations, alternative models, and parameterizations. To the extent that the goal of QTL mapping is to identify genomic regions that contain genes and regulatory factors that influence a trait in any way, we argue that such a misclassification is  often irrelevant. Whether we pursue bioinformatic follow-up to identify QTN in a region because it was identified as an mQTL or a vQTL need not change our downstream efforts.
In summary, we advocate for the use of mean-variance QTL mapping not as an additional flourish to consider after conducting an SLM-based QTL mapping effort, but rather as a drop-in replacement. This approach should not be too alienwhen variance heterogeneity is absent, it simplifies to the well-known SLM-based approach. Full-featured software that implements this approach is described in a companion article (Corty and Valdar 2018b).
Last, we note an additional benefit conferred by mean-variance QTL mapping not discussed in depth here. Variance heterogeneity can also derive from factors acting in the "background", that is, arising from experimental or biological variables that are outside the main focus of testing but that nonetheless predict phenotypic variability and thereby inform the relative precision of one individual's phenotype over another. In the case studies presented here, the only background factor considered was sex. But, more generally, any factor that a researcher considers as a potentially important covariate that should be modeled can be included not only as a mean covariate (as with the SLM) but also as a variance covariate. As described in a companion article, Corty and Valdar (2018a), this practice has the potential to both deliver increased power to detect QTL (mQTL, vQTL, and mvQTL) as well as increase the reproducibility of published QTL.