Testing Pleiotropy vs. Separate QTL in Multiparental Populations

The high mapping resolution of multiparental populations, combined with technology to measure tens of thousands of phenotypes, presents a need for quantitative methods to enhance understanding of the genetic architecture of complex traits. When multiple traits map to a common genomic region, knowledge of the number of distinct loci provides important insight into the underlying mechanism and can assist planning for subsequent experiments. We extend the method of Jiang and Zeng (1995), for testing pleiotropy with a pair of traits, to the case of more than two alleles. We also incorporate polygenic random effects to account for population structure. We use a parametric bootstrap to determine statistical significance. We apply our methods to a behavioral genetics data set from Diversity Outbred mice. Our methods have been incorporated into the R package qtl2pleio.

multiparental populations can have complex patterns of relatedness among subjects, and failure to account for these patterns of relatedness may lead to spurious results (Yang et al. 2014). Second, previous tests allowed for only two founder lines (Jiang and Zeng 1995). Finally, Jiang and Zeng (1995) assumed that the null distribution of the test statistic follows a chi-square distribution.
We developed a pleiotropy vs. separate QTL test for two traits in multiparental populations. Our test builds on research that Jiang and Zeng (1995), Knott and Haley (2000), Tian et al. (2016), and Zhou and Stephens (2014) initiated. Our innovations include the accommodation of k founder alleles per locus (compared to the traditional two founder alleles per locus) and the incorporation of multivariate polygenic random effects to account for relatedness. Furthermore, we implemented a parametric bootstrap test to assess statistical significance (Efron 1979;Tian et al. 2016). We focus on the case that two traits are measured in the same set of subjects (Design I in the notation of Jiang and Zeng (1995)).
Below, we describe our likelihood ratio test for pleiotropy vs. separate QTL. In simulation studies, we find that it is slightly conservative, and that it has power to detect two separate loci when the univariate LOD peaks are strong. We further illustrate our approach with an application to data on a pair of behavior traits in a population of 261 DO mice (Logan et al. 2013;Recla et al. 2014).

METHODS
Our strategy involves first identifying two traits that map to a common genomic region. We then perform a two-dimensional, two-QTL scan over the genomic region, with each trait affected by one QTL of varying position. We identify the QTL position that maximizes the likelihood under pleiotropy (that is, along the diagonal where the two QTL are at a common location), and the ordered pair of positions that maximizes the likelihood under the model where the two QTL are allowed to be distinct. The logarithm of the ratio of the two likelihoods is our test statistic. We determine statistical significance with a parametric bootstrap.

Data structures
The data consist of three objects. The first is an n by k by m array of allele probabilities for n subjects with k alleles and m marker positions on a single chromosome [derived from the observed SNP genotype data by a hidden Markov model; see Broman et al. (2019)]. The second object is an n by 2 matrix of phenotype values. Each column is a phenotype and each row is a subject. The third object is an n by c matrix of covariates, where each row is a subject and each column is a covariate.
One additional object is the genotype-derived kinship matrix, which is used in the linear mixed model to account for population structure. We are focusing on a defined genomic interval, and we prefer to use a kinship matrix derived by the "leave one chromosome out" (LOCO) method (Yang et al. 2014), in which the kinship matrix is derived from the genotypes for all chromosomes except the chromosome under test.

Statistical Models
Focusing on a pair of traits and a particular genomic region of interest, the next step is a two-dimensional, two-QTL scan (Jiang and Zeng 1995). We consider two QTL with each affecting a different trait, and consider all possible pairs of locations for the two QTL. For each pair of positions, we fit the multivariate linear mixed effects model defined in Equation 1. Note that we have assumed an additive genetic model throughout our analyses, but extensions to design matrices that include dominance are straightforward.
where Y is the n by 2 matrix of phenotypes values; X is a 2n by 2ðk þ cÞ matrix that contains the k allele probabilities for the two QTL positions and the c covariates in diagonal blocks; B is a ðk þ cÞ by 2 matrix of allele effects and covariate effects; G is a n by 2 matrix of random effects; and E is a n by 2 matrix of random errors. n is the number of mice. The 'vec' operator stacks columns from a matrix into a single vector. For example, a 2 by 2 matrix inputted to 'vec' results in a vector with length 4. Its first two entries are the matrix's first column, while the third and fourth entries are the matrix's second column.
We also impose distributional assumptions on G and E: and where MN nx2 ð0; V r ; V c Þ denotes the matrix-variate (n by 2) normal distribution with mean being the n by 2 matrix with all zero entries and row covariance V r and column covariance V c . We assume that G and E are independent.
Parameter inference and log likelihood calculation Inference for parameters in multivariate linear mixed effects models is notoriously difficult and can be computationally intense (Meyer 1989(Meyer , 1991. Thus, we estimate V g and V e under the null hypothesis of no QTL, and then take them as fixed and known in our two-dimensional, two-QTL genome scan. We use restricted maximum likelihood methods to fit the model: where X 0 is a 2n by 2ðc þ 1Þ matrix whose first column of each diagonal block in X 0 has all entries equal to one (for an intercept); the remaining columns are the covariates. We draw on our R implementation (Boehm 2018) of the GEMMA algorithm for fitting a multivariate linear mixed effects model with expectation-maximization (Zhou and Stephens 2014). We use restricted maximum likelihood fits for the variance components V g and V e in subsequent calculations of the generalized least squares solutionB.
where 5 denotes the Kronecker product, K is the kinship matrix, and I n is a n by n identity matrix. We then calculate the log likelihood for a normal distribution with mean XvecðBÞ and covarianceΣ that depends on our estimates of V g and V e (Equation 6).
Pleiotropy vs. separate QTL hypothesis testing framework Our test applies to two traits considered simultaneously. Below, l 1 and l 2 denote putative locus positions for traits one and two. We quantitatively state the competing hypotheses for our test as: Our likelihood ratio test statistic is: LOD ¼ log 10 max l 1 ;l 2 LðB; Σ; l 1 ; l 2 Þ max l LðB; Σ; l; lÞ where L is the likelihood for fixed QTL positions, maximized over all other parameters. The denominator concerns the likelihood for the null hypothesis of pleiotropy, where l ¼ l 1 ¼ l 2 .

Visualizing profile LOD traces
The output of the above analysis is a two-dimensional log 10 likelihood surface. To visualize these results, we followed an innovation of Zeng et al. (2000) and Tian et al. (2016), and plot three traces: the results along the diagonal (corresponding to the null hypothesis of pleiotropy), and then the profiles derived by fixing one QTL's position and maximizing over the other QTL's position. We define the LOD score for our test: where ll 10 denotes log 10 likelihood. We follow Zeng et al. (2000) and Tian et al. (2016) in defining profile LOD by the equation We define profile LOD 2 ðl 2 Þ analogously. The profile LOD 1 and profile LOD 2 traces have the same maximum value, which is non-negative and gives the overall LOD test statistic. We construct the pleiotropy trace by calculating the log-likelihoods for the pleiotropic models at every position.
By definition, the maximum value for this pleiotropy trace is zero.

Bootstrap for test statistic calibration
We use a parametric bootstrap to determine statistical significance (Efron 1979). While Jiang and Zeng (1995) used quantiles of a chisquared distribution to determine p-values, this does not account for the two-dimensional search over QTL positions. We follow the approach of Tian et al. (2016), and identify the maximum likelihood estimate of the QTL position under the null hypothesis of pleiotropy. We then use the inferred model parameters under that model and with the QTL at that position to simulate bootstrap data sets according to the model in equations 1-3. For each of b bootstrap data sets, we perform a two-dimensional QTL scan (over the genomic region of interest) and derive the test statistic value. We treat these b test statistics as the empirical null distribution, and calculate a p-value as the proportion of the b bootstrap test statistics that equal or exceed the observed one, with the original data, p ¼ #fi : LOD Ã i $ LODg=b where LOD Ã i denotes the LOD score for the ith bootstrap replicate and LOD is the observed test statistic.

SIMULATION STUDIES
We performed two types of simulation studies, one for type I error rate assessment and one to characterize the power to detect separate QTL. To simulate traits, we specified X, B, V g , K, and V e matrices (Equations 1-3). For both we used the allele probabilities from a single genomic region derived empirically from data for a set of 479 Diversity Outbred mice from Keller et al. (2018).
Type I error rate analysis To quantify type I error rate (i.e., false positive rate), we simulated 400 pairs of traits for each of eight sets of parameter inputs (Table 1). We used a 2 3 factorial experimental design with three factors: allele effects difference, allele effects partitioning, and genetic correlation, i.e., the off-diagonal entry in the 2 by 2 matrix V g .
We chose two strong allele effects difference values, 6 and 12. These ensured that the univariate phenotypes mapped with high LOD scores to the region of interest. For the allele partitioning factor, we used either equally frequent QTL alleles, or a private allele in the CAST strain (F). For the residual genetic correlation (the offdiagonal entry in V g ), we considered the values 0 and 0.6. The marginal genetic variances (i.e., the diagonal entries in V g ) for each trait were always set to one.
We performed 400 simulation replicates per set of parameter inputs, and each used b ¼ 400 bootstrap samples. For each bootstrap sample, we calculated the test statistic (Equation 8). We then compared the test statistic from the simulated trait against the empirical distribution of its 400 bootstrap test statistics. When the simulated trait's test statistic exceeded the 0.95 quantile of the empirical distribution of bootstrap test statistics, we rejected the null hypothesis. We observed that the test is slightly conservative over our range of parameter selections (Table 1), with estimated type I error rates , 0.05.

Power analysis
We also investigated the power to detect the presence of two distinct QTL. We used a 2 · 2 · 5 experimental design, where our three factors were allele effects difference, allele effects partitioning, and inter-locus distance. The two levels of allele effects difference were 1 and 2. The two levels of allele effects partitioning were as in the type I error rate studies, ABCD:EFGH and F:ABCDEGH (Table S1). The five levels of interlocus distance were 0, 0.5, 1, 2, and 3 cM. V g and V e were both set to the 2 by 2 identity matrix in all power study simulations.
We simulated 400 pairs of traits per set of parameter inputs. For each simulation replicate, we calculated the likelihood ratio test statistic. We then applied our parametric bootstrap to determine the statistical significance of the results. For each simulation replicate, we used b ¼ 400 bootstrap samples. Because the bootstrap test statistics within a single set of parameter inputs followed approximately the same distribution, we pooled the 400 · 400 ¼ 160; 000 bootstrap samples per set of parameter inputs and compared each test statistic to the empirical distribution derived from the 160,000 bootstrap samples. However, for parameter inputs with interlocus distance equal to zero, we did not pool the 160,000 bootstrap samples; instead, we proceeded by calculating power (i.e., type I error rate, in this case), as we did in the type I error rate study above.
We present our power study results in Figure 1. Power increases as interlocus distance increases. The top two curves correspond to the case where the QTL effects are largest. For each value for the QTL effect, power is greater when the QTL alleles are equally frequent, and smaller when a QTL allele is private to one strain. One can have high power to detect that the two traits have distinct QTL when they are separated by . 1 cM and when the QTL have large effect. We provide example profile LOD plots from the power analysis in Figure S3.

APPLICATION
To illustrate our methods, we applied our test to data from Logan et al. (2013) and Recla et al. (2014), on 261 DO mice measured for a set of behavioral phenotypes. Recla et al. (2014) identified Hydin as the gene that underlies a QTL on Chromosome 8 at 57 cM for the "hot plate latency" phenotype (a measure of pain tolerance). The phenotype "percent time in light" in a light-dark box (a measure of anxiety) was measured on the same set of mice (Logan et al. 2013) and also shows a QTL near this location, which led us to ask whether the same locus affects both traits. The two traits show a correlation of 20:15 ( Figure S1).
QTL analysis with the LOCO method, and using sex as an additive covariate, showed multiple suggestive QTL for each phenotype ( Figure S2; Table S2). For our investigation of pleiotropy, we focused on the interval 53-64 cM on Chromosome 8. Univariate analyses showed a QTL in this region for both traits (Figure 2).
The estimated QTL allele effects for the two traits are quite different (Figure 3). With the QTL placed at 55 cM, for "percent time in light", the WSB and PWK alleles are associated with large phenotypes and NOD with low phenotypes. For "hot plate latency", on the other hand, CAST and NZO show low phenotypes and NOD and PWK are near the center.
In applying our test for pleiotropy, we performed a two-dimensional, two-QTL scan for the pair of phenotypes. With these results, we created a profile LOD plot (Figure 4). The profile LOD for "percent time in light" (in brown) peaks near 55 cM, as was seen in the univariate analysis. The profile LOD for "hot plate latency" (in blue) peaks near 58 cM, also similar to the univariate analysis. The pleiotropy trace (in gray) peaks near 58 cM.
The likelihood ratio test statistic for the test of pleiotropy was 1.2. Based on a parametric bootstrap with 1,000 bootstrap replicates, the estimated p-value was 0.11. Thus, by our approach, the evidence for the two traits having distinct QTL is weak.

DISCUSSION
We developed a test of pleiotropy vs. separate QTL for multiparental populations, extending the work of Jiang and Zeng (1995) for multiple alleles and with a linear mixed model to account for population structure (Kang et al. 2010;Yang et al. 2014). Our simulation studies indicate that our test is slightly conservative, with type I error rates below their nominal values (Table 1). Power is affected by many factors (including sample size, effect size, and allele frequencies). We studied the effects of interlocus distance and QTL effect on power, and we showed that our test has power to detect presence of separate loci, especially when univariate trait associations are strong (Figure 1).
In the application of our method to two behavioral phenotypes in a study of 261 Diversity Outbred mice (Recla et al. 2014;Logan et al. 2013), the evidence for the presence of two distinct QTL, with one QTL (which contains the Hydin gene) affecting only "hot plate latency" and a second QTL affecting "percent time in light" was weak (P = 0.11, Figure 4).
Founder allele effects plots provide further evidence for the presence of two distinct loci. As Macdonald and Long (2007) and King et al. (2012) have demonstrated in their analyses of multiparental Drosophila populations, a biallelic pleiotropic QTL would result in allele effects plots that have similar patterns. While we do not know that "percent time in light" and "hot plate latency" arise from biallelic QTL, the dramatic differences that we observe Figure 1 Pleiotropy vs. separate QTL power curves for each of four sets of parameter settings. Factors that differ among the four curves are allele effects difference and allele partitioning. Red denotes high allele effects difference, while black is the low allele effects difference. Solid line denotes the even allele partitioning (ABCD:EFGH), while dashed line denotes the uneven allele partitioning (F:ABCDEGH).
n Table 1 Type I error rates for all runs in our 2 3 experimental design. We set (marginal) genetic variances (i.e., diagonal elements of V g ) to 1 in all runs. V e was set to the 2 by 2 identity matrix in all runs. We used allele probabilities at a single genetic marker to simulate traits for all eight sets of parameter inputs. In the column "Allele effects partitioning", "ABCD:EFGH" means that lines A-D carry one QTL allele while lines E-H carry the other allele. "F:ABCDEGH" means the QTL has a private allele in strain F in allele effects patterns further support the argument for two distinct loci.

Run
We have implemented our methods in an R package qtl2pleio, but analyses can be computationally intensive and time consuming.
qtl2pleio is written mostly in R, and so we could likely obtain improved computational speed by porting parts of the calculations to a compiled language such as C or C++. To accelerate our multi-dimensional QTL scans, we have integrated C++  code into qtl2pleio, using the Rcpp package (Eddelbuettel et al. 2011).
Another computational bottleneck is the estimation of the variance components V g and V e . To accelerate this procedure, especially for the joint analysis of more than two traits, we will consider other strategies for variance component estimation, including that described by Meyer et al. (2018). Meyer et al. (2018), in joint analysis of dozens of traits, implement a bootstrap strategy to estimate variance components for lower-dimensional phenotypes before combining bootstrap estimates into valid covariance matrices for the full multivariate phenotype. Such an approach may ease some of the computational burdens that we encountered.
We view tests of pleiotropy as complementary to mediation tests and related methods that have become popular for inferring biomolecular causal relationships (Chick et al. 2016;Schadt et al. 2005;Baron and Kenny 1986). A mediation test proceeds by including a putative mediator as a covariate in the regression analysis of phenotype and QTL genotype; a substantial reduction in the association between genotype and phenotype corresponds to evidence of mediation.
Mediation analyses and our pleiotropy test ask distinct, but related, questions. Mediation analysis seeks to establish causal relationships among traits, including molecular traits, or dependent biological and behavioral processes. Pleiotropy tests examine whether two traits share a single source of genetic variation, which may act in parallel or in a causal network. Pleiotropy is required for causal relations among traits. In many cases, the pleiotropy hypothesis is the only reasonable one. Schadt et al. (2005) argued that both pleiotropy tests and causal inference methods may contribute to gene network reconstruction. They developed a model selection strategy, based on the Akaike Information Criterion (Akaike 1974), to determine which causal model is most compatible with the observed data. Schadt et al. (2005) extended the methods of Jiang and Zeng (1995) to consider more complicated alternative hypotheses, such as the possibility of two QTL, one of which associates with both traits, and one of which associates with only one trait. As envisioned by Schadt et al. (2005), we foresee complementary roles emerging for our pleiotropy test and mediation tests in the dissection of complex trait genetic architecture.
Two related approaches for identifying and exploiting pleiotropy deserve mention. First, CAPE (Combined Analysis of Pleiotropy and Epistasis) is a strategy for identifying higher-order relationships among traits and marker genotypes (Tyler et al. 2013(Tyler et al. , 2016 and has recently been extended for use with multiparental populations, including DO mice (Tyler et al. 2017). CAPE exploits the pleiotropic relationship among traits in order to characterize the underlying network of QTL, and it can suggest possible pleiotropic effects, but it does not provide an explicit test of pleiotropy. Second, Schaid et al. (2016) described a test for pleiotropy in the context of human genome-wide association studies (GWAS). Their approach is fundamentally different from ours, in that rather than ask whether traits are affected by a common locus or distinct loci, they ask whether the traits are all affected by a particular SNP or only some are. The difference in these approaches may be attributed to the difference in mapping resolution between human GWAS and experimental populations.
Technological advances in mass spectrometry and RNA sequencing have enabled the acquisition of high-dimensional biomolecular phenotypes (Ozsolak and Milos 2011;Han et al. 2012). Multiparental populations in Arabidopsis, maize, wheat, oil palm, rice, Drosophila, yeast, and other organisms enable high-precision QTL mapping (Yu et al. 2008;Tisné et al. 2017;Stanley et al. 2017;Raghavan et al. 2017;Mackay et al. 2012;Kover et al. 2009;Cubillos et al. 2013). The need to analyze high-dimensional phenotypes in multiparental populations compels the scientific community to develop tools to study genotype-phenotype relationships and complex trait architecture. Our test, and its future extensions, will contribute to these ongoing efforts.

ACKNOWLEDGMENTS
The authors thank Lindsay Traeger, Julia Kemis, Qiongshi Lu, Rene Welch, and two anonymous referees for valuable suggestions to improve the manuscript. This work was supported in part by National Institutes of Health grants R01GM070683 (to K.W.B.) and P50DA039841 (to E.J.C.). The research made use of Figure 4 Profile LOD curves for the pleiotropy vs. separate QTL hypothesis test for "percent time in light" and "hot plate latency". Gray trace denotes pleiotropy LOD values. Likelihood ratio test statistic value corresponds to the height of the blue and gold traces at their maxima.
compute resources and assistance of the UW-Madison Center For High Throughput Computing (CHTC) in the Department of Computer Sciences at UW-Madison, which is supported by the Advanced Computing Initiative, the Wisconsin Alumni Research Foundation, the Wisconsin Institutes for Discovery, and the National Science Foundation, and is an active member of the Open Science Grid, which is supported by the National Science Foundation and the U.S. Department of Energy's Office of Science.