Abstract

The iteratively reweighted least square (IRLS) method is mostly identical to maximum likelihood (ML) method in terms of parameter estimation and power of quantitative trait locus (QTL) detection. But the IRLS is greatly superior to ML in terms of computing speed and the robustness of parameter estimation. In conjunction with the priors of parameters, ML can analyze multiple QTL model based on Bayesian theory, whereas under a single QTL model, IRLS has very limited statistical power to detect multiple QTLs. In this study, we proposed the iteratively reweighted least absolute shrinkage and selection operator (IRLASSO) for extending IRLS to simultaneously map multiple QTLs. The LASSO with coordinate descent step is employed to efficiently estimate non-zero genetic effect of each locus scanned over entire genome. Simulations demonstrate that IRLASSO has a higher precision of parameter estimation and power to detect QTL than IRLS, and is able to estimate residual variance more accurately than the unweighted LASSO based on LS. Especially, IRLASSO is very fast, usually taking less than five iterations to converge. The barley dataset from the North American Barley Genome Mapping Project is reanalyzed by our proposed method.

INTRODUCTION

In earlier quantitative trait loci (QTL) mapping based on a single QTL model, least square (LS) method developed by Haley and Knott [1] was generally considered to be simpler and more robust than maximum likelihood method [2, 3]. However, LS method would result in the inflation of residual variance after replacing QTL genotype indicator by its expectation conditional on the flanking markers [1]. By quantifying the inflated residual variance, iteratively reweighted least square (IRLS) was therefore proposed by Xu [4–6]. Simulations demonstrated that for normal traits, the performance of IRLS is almost identical to that of maximum likelihood (ML), but the computational speed of IRLS is many times faster than that of ML [5]. In addition, IRLS inherits the robustness of LS. The estimates of IRLS are essentially suboptimal when the weighted value was treated as a constant. As a result, Han and Xu [7] introduced the Fisher scoring algorithm for finding the maximum likelihood estimates of QTL parameters. Within the framework of generalized linear model, subsequently, the IRLS was extended to analyzing discrete traits, such as binary, ordinal and Poisson traits, by taking into account the heterogeneity of individual residual variances [8]. Meanwhile, this method is more appropriate to identify large QTL that locates in large maker intervals than the ML [7, 8].

As most of quantitative traits are controlled by multiple genetic and environmental factors, aforementioned interval mapping methods are not optimal without considering other linked QTLs. Thus, some mapping methodologies based on multiple QTL model were developed to improve the power of QTL detection. Theoretically, the multiple QTL mapping can be regarded as the statistical question of model search and selection, for either non-Bayesian model selection methods [9–16] or Bayesian model selection methods [17–22]. Compared with interval mapping, these advanced statistical methods tend to be more sensitive to the experimental designs, the data structures, the initial values of the parameters and other factors such as assumptions of the models. Specifically, the results of Bayesian mapping are hard to be interpreted due to lack of significance test. Although the Bayes factor was used to infer the significance of QTL in Bayesian model selection mapping by Yi et al. [20, 21], its distribution also depends on population structure and marker density. Che and Xu [23] have proposed significance test in Bayesian shrinkage analysis by using permutation test, but computation burden of the approach is heavy.

Currently, the Bayesian shrinkage methods increasingly become a promising and active research area of QTL mapping and genomic selection [15, 24–28], in which, all estimated effects are forced to shrink toward zero. The fully Bayesian shrinkage has heavy computing burden and thus has not been widely applied to mapping QTL. In contrast, empirical Bayes methods [24, 29, 30] can produce results similar to those of the fully Bayesian approaches but could be quickly computed. LASSO is another shrinkage method that has been widely used in regression analysis for large models [31]. It can solve large equations extremely fast using coordinate descent algorithm [32]. If a double-exponential prior is assigned to regression coefficient, the Bayesian estimates are equivalent to the LASSO estimates [33]. Many improved LASSO [16, 26, 29, 34] have been developed for QTL mapping. By choosing various priors for QTL effects, those mapping methods may enhance shrinking efficiency to a certain extent, but their computing speeds are far inferior to regular LASSO with a double-exponential prior [31]. Moreover, these studies only analyze known markers and do not provide an efficient and convincing method for statistically inferring QTL.

In this study, we extend Xu’s IRLS to simultaneously mapping multiple QTLs. Fast LASSO [32, 35] is employed to estimate genetic effect of each locus scanned over entire genome. This mapping method is known as iteratively reweighted LASSO. It can, not only estimate QTL effects, but also test the significance of QTLs, like regular interval mapping. The performance of our proposed method is demonstrated via computer simulation. Finally, we apply this method to reanalyzing the barley dataset from the North American Barley Genome Mapping Project.

METHOD

Genetic model

Our mapping method is developed for improving linkage analysis. It requires that on n individuals, phenotypes are observed and m co-dominant markers with known linkage map are genotyped in a genetically designed population. Prior to mapping QTL, all positions on whole genome may be candidates of QTLs. The difference among present mapping methods lies mainly in the way to search QTLs on the genome. Traditional interval mapping identifies QTL by genome-wide scanning, whereas Bayesian mapping by randomly sampling QTL positions. For mapping multiple QTLs, we partition the entire genome into k loci that include not only the marker positions but also points between markers. The points between markers are equally spaced by the length of scanning step or tuning value used in sampling QTL position. The linear model of phenotypic value (yi, for i = 1,2, … ,n) on QTL effects of k loci is then expressed as  

(1)
formula
where βj is the jth systematic environmental effect that is irrelevant to QTL effects, xij is the corresponding design matrix on the ith individual, γj is the jth locus genetic effect, zij is the corresponding incidence matrix determined by the genotypes of locus k, εi is the residual error, which is assumed to be a normal distribution forumla with σ2 being residual variance.

In the Equation (1), zij is missing value unless the locus is just a fully informative marker. Following LS method by Haley and Knott [1], we can simply estimate unknown zij by the expectation of zij conditional on flanking markers genotype. Thus, Equation (1) becomes  

(2)
formula
where uij = E(zij) is the conditional expectation of zij, which depends on the genetically designed population (see the Appendix for details).

Considering the heterogeneity of residual variance after replacing zij by its conditional expectation uij, we exactly write Equation (1) by  

(3)
formula
Let  
(4)
formula
be the new residual error, which combines unobservable zij with the residual error εi. Then, Equation (2) can be rewritten as  
(5)
formula

Define forumlaforumla and forumla. In the notation of matrix, the Equation (5) is described as  

(6)
formula

The expectation of ei is  

(7)
formula
and its variance is  
(8)
formula
with forumla for j and j′=1, 2 ⋯, k being the covariance matrix of indicator variances. Different from what Xu [6] defined earlier, Σi is derived from multiple QTLs and considers the linkages among multiple QTLs,where rjj is recombination coefficient between loci j and j′, rjj = 0 if j = j′. forumla and forumla are defined as conditional variances at loci j and j′ for the ith individual, which have explicit form:  
(9)
formula

The equation has different form due to genetically designed population, as shown in the Appendix. If these loci overlap with completely informative markers, forumla under the assumption of no segregation distortion.

Parameter estimation

The s in Equation (5) can be relatively small, especially in genetically designed population with a single family, but k may vary large and sometimes may be even larger than the sample size n. Thus, we decide to adopt a modified LASSO to estimate the effects of these loci.

Transform residual variance forumla into forumla with forumla being weighted value, and let forumla, forumla and forumla. By minimizing forumlaforumla, the regression coefficients in Equation (5) can be solved using extremely fast LASSO with a coordinate descent step [35]:to partially optimize with respect to γl(l = 1,2,…,k), the gradient at forumla is computed by  

formula

The equation is required to treat separately by forumla, forumla or forumla. Simple calculus for the coordinate-wise update [36, 37] is implemented in the following way  

formula
where, forumla is the fitted value excluding the genetic effect for the lth loci, and forumla denoted by z is the simple least-squares coefficient when fitting this partial residual to the lth loci. S(z,λ) is the soft-thresholding operator with value  
formula

As a tuning parameter, λ can be chosen by cross-validation for LASSO. In our computation, however, path-wise coordinate descent proposed by Friedman et al. [32] is adopted to choose lambda efficiently.

Because wi is a function of estimated parameters, iterations are required. The iteration process is recommended as follow:

  • Initialize wi = 1

  • Estimate βj, γj and σ2 using unweighted LASSO

  • Update wi using σ2 and non-zero βj and γj

  • Estimate βj and γj using the weighted LASSO

  • Repeat step 3 and 4 until forumla ≤10−6 or 10−8, where t is iterating time.

Herein, we call the iteration process as iteratively reweighted LASSO.

Significance test

A subtset of loci with non-zero genetic effects are retained after the shrinkage of IRLASSO. Once the number of non-zero effects is greater than the sample size, we can drop those linked loci by assuming that there is only one QTL within a marker interval. For the retained loci with non-zero effects, the iteratively weighted least square estimates of the parameters are  

(10)
formula
where forumla.

and  

(11)
formula

The variance–covariance matrix of the estimated parameter is calculated as  

(12)
formula

From Equations (10), (11) and (12), we can form t-test statistic to test the significance of the retained loci with non-zero effects.

Considering that t-test statistic is influenced by population structure and marker density, we use the critical threshold generated by permutation tests [38]. By repeatedly shuffling the relationships between marker genotypes and phenotypes, a series of the maximum t-statistic values are calculated, from the distribution of which, the critical threshold is obtained.

Simulations

Simulations are conducted to investigate statistical behavior of our proposed method. In a backcross population, we simulate six chromosomes, each with a length of 100 cM and evenly placed co-dominant markers. The 2, 2, 1, 2, 1 and 2 QTLs are respectively put on six chromosomes, whose positions and genetic effects are listed in Tables 1 and 2. To access the influence of sample size and marker density on mapping results, sample size is designed as the three levels: 150, 200 and 300. For sample sizes of 150 and 300, 11 markers are distributed on each chromosome. For sample sizes of 200, there are 21 markers on each of the first 2 chromosomes, 11 on the middle 2 chromosomes and 5 on the rest two chromosomes. Only one QTL is put on each chromosome, whose genetic effects are classified into two levels of 2.5 and 0.8 on the chromosome with same marker density.

Table 1:

Mean estimates and standard deviations (in parentheses) of QTL positions detected with three mapping methods for the simulated datasets with sample sizes 150 and 300

Sample size Method QTL no.
 
  10 
 True 23 56 148 193 267 332 390 478 522 574 
150 IRLASSO 25.1 (2.6) 58.5 (2.9) 151.5 (3.7) 204.3 (12.3) 264.8 (8.0) 335.7 (4.3) 403.9 (13.5) 474.8 (11.5) 524.9 (5.3) 580.5 (4.6) 
UWLASSO 25.0 (2.8) 58.4 (2.9) 151.2 (4.0) 204.8 (12.5) 264.2 (8.1) 335.6 (4.4) 403.9 (13.8) 474.2 (11.7) 524.5 (5.6) 580.5 (4.7) 
IRLS 44.6 (8.3) 48.3 (9.3) 166.1 (13.1) 186.7 (10.8) 267.1 (20.1) 332.4 (15.3) 386.7 (15.2) 480.3 (8.3) 533.5 (7.5) 580.1 (13.8) 
300 IRLASSO 25.4 (1.9) 58.8 (2.1) 151.6 (3.0) 196.6 (2.3) 269.5 (5.0) 337.1 (2.9) 395.3 (3.2) 482.4 (2.1) 526.2 (4.7) 580.6 (3.1) 
UWLASSO 25.3 (2.2) 58.5 (2.6) 151.4 (3.2) 196.7 (2.3) 269.4 (5.0) 336.9 (3.3) 395.0 (3.4) 482.2 (2.5) 525.8 (5.2) 580.3 (3.4) 
IRLS 45.0 (7.7) 48.2 (8.8) 169.1 (11.2) 188.8 (9.2) 265.9 (20.3) 332.4 (13.7) 388.8 (13.8) 481.2 (5.8) 537.8 (17.1) 582.3 (10.6) 
Sample size Method QTL no.
 
  10 
 True 23 56 148 193 267 332 390 478 522 574 
150 IRLASSO 25.1 (2.6) 58.5 (2.9) 151.5 (3.7) 204.3 (12.3) 264.8 (8.0) 335.7 (4.3) 403.9 (13.5) 474.8 (11.5) 524.9 (5.3) 580.5 (4.6) 
UWLASSO 25.0 (2.8) 58.4 (2.9) 151.2 (4.0) 204.8 (12.5) 264.2 (8.1) 335.6 (4.4) 403.9 (13.8) 474.2 (11.7) 524.5 (5.6) 580.5 (4.7) 
IRLS 44.6 (8.3) 48.3 (9.3) 166.1 (13.1) 186.7 (10.8) 267.1 (20.1) 332.4 (15.3) 386.7 (15.2) 480.3 (8.3) 533.5 (7.5) 580.1 (13.8) 
300 IRLASSO 25.4 (1.9) 58.8 (2.1) 151.6 (3.0) 196.6 (2.3) 269.5 (5.0) 337.1 (2.9) 395.3 (3.2) 482.4 (2.1) 526.2 (4.7) 580.6 (3.1) 
UWLASSO 25.3 (2.2) 58.5 (2.6) 151.4 (3.2) 196.7 (2.3) 269.4 (5.0) 336.9 (3.3) 395.0 (3.4) 482.2 (2.5) 525.8 (5.2) 580.3 (3.4) 
IRLS 45.0 (7.7) 48.2 (8.8) 169.1 (11.2) 188.8 (9.2) 265.9 (20.3) 332.4 (13.7) 388.8 (13.8) 481.2 (5.8) 537.8 (17.1) 582.3 (10.6) 
Table 2:

Mean estimates and standard deviations (in parentheses) of QTL effects obtained with three mapping methods for the simulated datasets with sample sizes 150 and 300

Sample size Method QTL no.
 
  10 
 True effect 1.50 2.00 0.72 1.10 −0.22 0.70 −0.65 1.25 0.35 −0.80 
150 IRLASSO 1.52 (0.19) 1.98 (0.18) 0.91 (0.28) 0.74 (0.54) −0.21 (0.17) 0.61 (0.19) −0.40 (0.41) 1.12 (0.24) 0.35 (0.16) −0.77 (0.15) 
UWLASSO 1.52 (0.19) 1.98 (0.19) 0.91 (0.28) 0.71 (0.56) −0.21 (0.18) 0.61 (0.20) −0.39 (0.42) 1.10 (0.26) 0.34 (0.16) −0.77 (0.16) 
IRLS 2.92 (0.22) 2.95 (0.22) 1.43 (0.29) 1.52 (0.30) −0.28 (0.48) 0.50 (0.45) −0.34 (0.52) 1.29 (0.28) −0.08 (0.52) −0.72 (0.39) 
300 IRLASSO 1.55 (0.11) 1.95 (0.12) 0.72 (0.09) 1.09 (0.10) −0.23 (0.07) 0.70 (0.10) −0.65 (0.10) 1.24 (0.07) 0.35 (0.10) −0.79 (0.10) 
UWLASSO 1.55 (0.14) 1.94 (0.14) 0.71 (0.11) 1.09 (0.11) −0.22 (0.09) 0.69 (0.11) −0.65 (0.12) 1.24 (0.08) 0.35 (0.11) −0.78 (0.12) 
IRLS 2.88 (0.18) 2.90 (0.18) 1.35 (0.22) 1.46 (0.21) −0.26 (0.32) 0.49 (0.33) −0.39 (0.40) 1.27 (0.21) −0.15 (0.42) −0.72 (0.26) 
Sample size Method QTL no.
 
  10 
 True effect 1.50 2.00 0.72 1.10 −0.22 0.70 −0.65 1.25 0.35 −0.80 
150 IRLASSO 1.52 (0.19) 1.98 (0.18) 0.91 (0.28) 0.74 (0.54) −0.21 (0.17) 0.61 (0.19) −0.40 (0.41) 1.12 (0.24) 0.35 (0.16) −0.77 (0.15) 
UWLASSO 1.52 (0.19) 1.98 (0.19) 0.91 (0.28) 0.71 (0.56) −0.21 (0.18) 0.61 (0.20) −0.39 (0.42) 1.10 (0.26) 0.34 (0.16) −0.77 (0.16) 
IRLS 2.92 (0.22) 2.95 (0.22) 1.43 (0.29) 1.52 (0.30) −0.28 (0.48) 0.50 (0.45) −0.34 (0.52) 1.29 (0.28) −0.08 (0.52) −0.72 (0.39) 
300 IRLASSO 1.55 (0.11) 1.95 (0.12) 0.72 (0.09) 1.09 (0.10) −0.23 (0.07) 0.70 (0.10) −0.65 (0.10) 1.24 (0.07) 0.35 (0.10) −0.79 (0.10) 
UWLASSO 1.55 (0.14) 1.94 (0.14) 0.71 (0.11) 1.09 (0.11) −0.22 (0.09) 0.69 (0.11) −0.65 (0.12) 1.24 (0.08) 0.35 (0.11) −0.78 (0.12) 
IRLS 2.88 (0.18) 2.90 (0.18) 1.35 (0.22) 1.46 (0.21) −0.26 (0.32) 0.49 (0.33) −0.39 (0.40) 1.27 (0.21) −0.15 (0.42) −0.72 (0.26) 

The simulated datasets are analyzed by using IRLASSO, unweighted LASSO (UWLASSO) and IRLS. The critical values of the test statistic for declaring statistical significance may be different due to different methods and situations. Under the null model with zero genetic effects, we simulate 500 additional samples to determine the critical values (Table 3). The 400 replicated simulations are used to estimate QTL parameters and to access the statistical power of QTL detection. Statistical power of QTL detection is calculated by each locus as the percentage of the number of those simulations that statistic value exceeds critical value at the locus. Also, false positive rate is assessed with the 400 replicated simulations under the null model with zero genetic effects.

Table 3:

The empirical critical thresholds obtained by three mapping methods under different sample sizes

Method Sample size
 
 150 200 300 
IRLASSO 2.39 2.39 2.54 
UWLASSO 2.38 2.36 2.55 
IRLS 2.90 2.83 2.98 
Method Sample size
 
 150 200 300 
IRLASSO 2.39 2.39 2.54 
UWLASSO 2.38 2.36 2.55 
IRLS 2.90 2.83 2.98 

For sample sizes of 150 and 300, the statistical powers to detect QTLs with three mapping methods are given in Table 4, along with false positive rate and exactly identifying proportion for all true QTLs. As can be seen, although the statistical powers at each locus between IRLASSO and UWLASSO are similar, statistical powers of IRLS are greatly lower than those of IRLASSO and UWLASSO at those loci with smaller genetic effects. False positive rates are <10% for the two sample sizes and three mapping methods, but IRLASSO performs lower false positive rates than the others. Exactly identifying proportions for all true QTLs are generally low due to the difficulty to detect the fifth and ninth QTLs with low genetic effects, although IRLASSO is capable of identifying all true QTLs simultaneously with higher statistical power than UWLASSO and IRLS. The two LASSO methods can estimate QTL effects more accurately than IRLS (Table 2). Generally, the statistical performance of each mapping method increases as sample size and genetic effect increase. Compared with UWLASSO, especially, the IRLASSO is able to precisely evaluate residual variance. With IRLASSO and UWLASSO, the estimates (standard deviations) are 1.09 (0.21) and 1.88 (0.24), respectively, for sample size of 150, whereas 1.07 (0.17) and 1.91 (0.20), respectively, for sample size of 300.

Table 4:

Statistical powers of QTL detection, false positive rate and exactly identifying proportion for all true QTL obtained with three mapping methods for the simulated datasets with sample sizes 150 and 300

Sample size Method Statistical power of each QTL
 
FPR EIP 
  10   
150 IRLASSO 100 100 96.5 100 13.5 98 96.3 100 41 96 4.8 12.3 
UWLASSO 100 100 94 100 13.3 94.5 93.5 98.5 39 96.3 5.5 12 
IRLS 100 100 88 95 0.5 2.5 79 1.3 24 
300 IRLASSO 100 100 100 100 31.5 100 99 100 72.5 100 31.3 
UWLASSO 100 100 100 100 31.3 99.5 98.5 100 71 99 31 
IRLS 100 100 99 100 1.5 15 12 98 3.5 42 6.3 0.3 
Sample size Method Statistical power of each QTL
 
FPR EIP 
  10   
150 IRLASSO 100 100 96.5 100 13.5 98 96.3 100 41 96 4.8 12.3 
UWLASSO 100 100 94 100 13.3 94.5 93.5 98.5 39 96.3 5.5 12 
IRLS 100 100 88 95 0.5 2.5 79 1.3 24 
300 IRLASSO 100 100 100 100 31.5 100 99 100 72.5 100 31.3 
UWLASSO 100 100 100 100 31.3 99.5 98.5 100 71 99 31 
IRLS 100 100 99 100 1.5 15 12 98 3.5 42 6.3 0.3 

FPR, false positive rate; EIP, exactly identifying proportion.

Table 5 shows the statistical powers of QTL detection and estimates of QTL parameters obtained with three mapping methods for sample size of 200. No influence of sparse markers on convergence to the correct solutions has been found. As expected, the statistical power and precision of parameter estimation increase with the increased marker density and QTL effect. The residual variances (standard deviations) are estimated as 1.28 (0.23) and 2.52 (0.29) by IRLASSO and UWLASSO methods, respectively, again demonstrating the capability of more accuracy estimating residual variance by IRLASSO than UWLASSO method.

Table 5:

Statistical powers of QTL detection and parameter estimates (standard deviations) obtained with three mapping methods for the simulated dataset with sample size 200

Method  QTL no.
 
  
True Position 56 157 258 359 460 561 
 Effect 2.5 0.8 2.5 0.8 2.5 0.8 
IRLASSO Power 100 99 100 99.3 100 95.3 
 Position 58.09 (2.98) 461.81 (4.90) 260.96 (2.47) 361.74 (4.09) 159.34 (2.18) 563.60 (4.72) 
 Effect 2.47 (0.13) 0.80 (0.11) 2.47 (0.11) 0.79 (0.10) 2.49 (0.11) 0.80 (0.11) 
UWLASSO Power 100 98 100 99 100 98.5 
 Position 57.83 (3.23) 462.10 (5.19) 260.91 (2.42) 361.56 (4.05) 158.80 (3.18) 563.45 (5.03) 
 Effect 2.46 (0.17) 0.81 (0.13) 2.46 (0.12) 0.80 (0.13) 2.47 (0.16) 0.80 (0.13) 
IRLS Power 100 21 100 29 100 18.5 
 Position 57.08 (2.35) 458.64 (15.71) 259.47 (3.91) 358.20 (14.46) 157.94 (3.04) 560.17 (13.51) 
 Effect 2.54 (0.27) 0.88 (0.40) 2.49 (0.30) 0.89 (0.39) 2.55 (0.26) 0.91 (0.33) 
Method  QTL no.
 
  
True Position 56 157 258 359 460 561 
 Effect 2.5 0.8 2.5 0.8 2.5 0.8 
IRLASSO Power 100 99 100 99.3 100 95.3 
 Position 58.09 (2.98) 461.81 (4.90) 260.96 (2.47) 361.74 (4.09) 159.34 (2.18) 563.60 (4.72) 
 Effect 2.47 (0.13) 0.80 (0.11) 2.47 (0.11) 0.79 (0.10) 2.49 (0.11) 0.80 (0.11) 
UWLASSO Power 100 98 100 99 100 98.5 
 Position 57.83 (3.23) 462.10 (5.19) 260.91 (2.42) 361.56 (4.05) 158.80 (3.18) 563.45 (5.03) 
 Effect 2.46 (0.17) 0.81 (0.13) 2.46 (0.12) 0.80 (0.13) 2.47 (0.16) 0.80 (0.13) 
IRLS Power 100 21 100 29 100 18.5 
 Position 57.08 (2.35) 458.64 (15.71) 259.47 (3.91) 358.20 (14.46) 157.94 (3.04) 560.17 (13.51) 
 Effect 2.54 (0.27) 0.88 (0.40) 2.49 (0.30) 0.89 (0.39) 2.55 (0.26) 0.91 (0.33) 

Examples

The barley dataset from the North American Barley Genome Mapping Project (http://wheat.pw.usda.gov/ggpages/SxM/) is used to demonstrate our proposed methods. This mapping population consists of 150 doubled haploids (DHs) derived from the cross Steptoe × Morex. A total of eight traits are observed under multiple different environments and 223 markers covering a genome of length ∼1500 cM are genotyped for each DH line.

Based on the above dataset, we adopt IRLASSO, UWLASSO and IRLS methods, respectively, to locate QTLs for five traits, including alpha amylase (20° U), grain protein (%), grain yield (MT/ha), height (cm) and lodging (%) in barley. The phenotypic values of these traits are measured by using the averages of observations across multiple environments.

For convenience to compare three mapping methods, we transform statistic t to –log(P), where P is the probability of greater than statistic value t. The genome-wide empirical critical thresholds for five traits are obtained by using 1000 permutation tests. Under the same mapping method, obviously, the critical thresholds are very near among the traits analyzed, as pointed by horizontal reference line in Figure 1. Figure 1 plots the profiles of –log(P) statistics over the genome for the five traits obtained with IRLASSO method. As shown, there are some peaks exceeding corresponding critical thresholds for all traits analyzed, providing the evidence for existence of QTLs controlling alpha amylase, grain protein, grain yield, height and lodging. Most of the QTLs detected, distribute on chromosomes 3 (six QTLs), 2 (five QTLs) and 7 (five QTLs), in which, some QTLs perform the pleiotropy, for instance, the QTL on marker ‘ABG399’ simultaneously governing grain protein, grain yield, height and lodging. Specifically, many QTLs are identified on markers: three markers for alpha amylase, five markers for grain protein, two markers for grain yield, three markers for height and two markers for lodging. UWLASSO identifies the totally same QTLs as those detected with IRLASSO, but more QTLs than those detected with IRLS. Other three traits have also been analyzed, which support the advantage of our proposed method (Results not shown).

Figure 1:

The profile of –log(P) test statistics obtained with three mapping methods for the analyzed five traits in barley: (i) alpha amylase, (ii) grain protein, (iii) grain yield, (iv) height and (v) lodging. In each plot, the genome-wide threshold value is given as a horizontal reference line. Chromosomes are separated by the vertical dotted lines and marker positions are indicated by the ticks on the horizontal axis.

Figure 1:

The profile of –log(P) test statistics obtained with three mapping methods for the analyzed five traits in barley: (i) alpha amylase, (ii) grain protein, (iii) grain yield, (iv) height and (v) lodging. In each plot, the genome-wide threshold value is given as a horizontal reference line. Chromosomes are separated by the vertical dotted lines and marker positions are indicated by the ticks on the horizontal axis.

Table 6 tabulates parameter estimates of QTLs detected with three mapping methods. With IRLASSO, most of the QTLs are identified to be positive effects, suggesting that the alleles of parent ‘Steptoe’ contribute to increase in all traits except for malt extract. The proportions of phenotypic variation explained by the detected QTLs ranged from 2% to 54%. The largest heritability is 0.54 of the QTL for height on chromosome 2, followed by 0.44 of the QTL for grain yield on chromosome 3 and the smallest one is 0.02 of the QTL for height on chromosome 6. For the same trait, the residual variance estimated by UWLASSO is consistently greater than that by IRLASSO, so that the UWLASSO generally underestimates the heritabilities of QTLs detected.

Table 6:

Estimated QTL parameters from the barley QTL mapping experiment

Trait no. QTL no. IRLASSO
 
UWLASSO
 
IRLS 
  Chr-pos. (cM) Marker interval Additive effect Residual variance Heritability −log(PAdditive effect Residual variance Heritability Chr-pos. (cM) 
1–8 Brz ∼ ABC156D 1.20 1.69 0.22 6.71 1.23 2.02 0.22 1–68 
1–88.6 KFP194 1.00  0.16 5.63 0.99  0.14  
2–68 MWG557 ∼ ABG316C 1.02  0.16 6.81 1.00  0.15 2–76 
4–17.1 CDO669 0.72  0.08 4.37 0.70  0.07 4–10 
7–101.8 WG364 0.86  0.12 6.31 0.87  0.11 7–80 
2–76 ABC167B ∼ bBE54D 0.38 0.09 0.27 11.36 0.38 0.11 0.26 2–68 
3–16 ABA307B ∼ ABC171 −0.14  0.04 4.26 −0.13  0.03  
3–52.6 ABG399 0.18  0.06 5.45 0.17  0.05  
4–119.5 ABG500B 0.26  0.13 7.96 0.23  0.10 4–112 
6–6.8 MWG620 0.14  0.04 4.46 0.13  0.03  
6–93.5 Nar7 0.16  0.05 4.36 0.16  0.05  
7–52 Ubi2 ∼ ksuA3A 0.33  0.20 14.33 0.35  0.22 7–56 
7–137.8 MWG514B −0.17  0.05 4.60 −0.17  0.05  
3–52.6 ABG399 −0.26 0.07 0.44 16.66 −0.26 0.08 0.42 3–56 
6–67.5 CDO497 0.12  0.09 4.17 0.12  0.09  
2–40 MWG858 ∼ ABG358 5.60 3.72 0.54 42.53 5.58 4.48 0.52 2–39.3 
2–71.8 ABC167B −2.01  0.07 9.30 −1.96  0.06  
3–29.8 MWG584 1.25  0.03 5.01 1.19  0.02 3–56.1 
3–56 ABG399 ∼ BCD828 2.86  0.14 18.69 2.92  0.14  
6–6 ABG062 ∼ MWG620 0.99  0.02 4.62 0.97  0.02  
7–80 ABC302 ∼ CDO57B 2.66  0.12 16.72 2.74  0.13 7–78.3 
7–137.8 MWG514B −1.23  0.03 5.53 −1.21  0.02  
1–122.8 KFP190 −3.79 39.17 0.09 4.15 3.21 44.87 0.08 1–108 
2–68.2 ABG316C −4.36  0.13 6.76 3.36  0.09  
2–168 ABG316E ∼ Pcr1 3.75  0.09 6.40 2.19  0.04 2–154 
3–54 ABG399 ∼ BCD828 8.09  0.43 13.43 7.33  0.43 3–56 
Trait no. QTL no. IRLASSO
 
UWLASSO
 
IRLS 
  Chr-pos. (cM) Marker interval Additive effect Residual variance Heritability −log(PAdditive effect Residual variance Heritability Chr-pos. (cM) 
1–8 Brz ∼ ABC156D 1.20 1.69 0.22 6.71 1.23 2.02 0.22 1–68 
1–88.6 KFP194 1.00  0.16 5.63 0.99  0.14  
2–68 MWG557 ∼ ABG316C 1.02  0.16 6.81 1.00  0.15 2–76 
4–17.1 CDO669 0.72  0.08 4.37 0.70  0.07 4–10 
7–101.8 WG364 0.86  0.12 6.31 0.87  0.11 7–80 
2–76 ABC167B ∼ bBE54D 0.38 0.09 0.27 11.36 0.38 0.11 0.26 2–68 
3–16 ABA307B ∼ ABC171 −0.14  0.04 4.26 −0.13  0.03  
3–52.6 ABG399 0.18  0.06 5.45 0.17  0.05  
4–119.5 ABG500B 0.26  0.13 7.96 0.23  0.10 4–112 
6–6.8 MWG620 0.14  0.04 4.46 0.13  0.03  
6–93.5 Nar7 0.16  0.05 4.36 0.16  0.05  
7–52 Ubi2 ∼ ksuA3A 0.33  0.20 14.33 0.35  0.22 7–56 
7–137.8 MWG514B −0.17  0.05 4.60 −0.17  0.05  
3–52.6 ABG399 −0.26 0.07 0.44 16.66 −0.26 0.08 0.42 3–56 
6–67.5 CDO497 0.12  0.09 4.17 0.12  0.09  
2–40 MWG858 ∼ ABG358 5.60 3.72 0.54 42.53 5.58 4.48 0.52 2–39.3 
2–71.8 ABC167B −2.01  0.07 9.30 −1.96  0.06  
3–29.8 MWG584 1.25  0.03 5.01 1.19  0.02 3–56.1 
3–56 ABG399 ∼ BCD828 2.86  0.14 18.69 2.92  0.14  
6–6 ABG062 ∼ MWG620 0.99  0.02 4.62 0.97  0.02  
7–80 ABC302 ∼ CDO57B 2.66  0.12 16.72 2.74  0.13 7–78.3 
7–137.8 MWG514B −1.23  0.03 5.53 −1.21  0.02  
1–122.8 KFP190 −3.79 39.17 0.09 4.15 3.21 44.87 0.08 1–108 
2–68.2 ABG316C −4.36  0.13 6.76 3.36  0.09  
2–168 ABG316E ∼ Pcr1 3.75  0.09 6.40 2.19  0.04 2–154 
3–54 ABG399 ∼ BCD828 8.09  0.43 13.43 7.33  0.43 3–56 

DISCUSSION

In this study, we extend Xu’s IRLS for a single QTL model to simultaneously analyzing multiple QTLs, greatly improving the power to detect QTL. So-called IRLASSO method inherits quick convergence of Xu’s IRLS, and uses extremely fast LASSO with coordinate descent step to estimate non-zero QTL effects. Thus, the mapping method can identify multiple QTLs in manner of high computing efficiency, as compared with existing LASSO for QTL mapping. Through multiple regression analysis for few loci with non-zero effects, the IRLASSO method is capable of statistically inferring the significance of QTL detected. Like regular interval mapping, this significance test includes construction of test statistic and permutation test for determining critical value. It should be clarified that if the marker density is sufficiently high, there is no need for the IRLS because LS, ML and IRLS, all would be the same. High dense marker maps are not always available for many species. Therefore, our proposed method remains useful as long as marker density is low or there are missing genotypes or both. Missing genotypes will always happen no matter how dense a marker map is.

Although the unweighted LASSO inflates the estimation of residual variance, as demonstrated in simulations, it is mostly consistent with IRLASSO method in terms of parameter estimation and power of QTL detection. For simplification of computation, we can omit the iteratively weighted step for LASSO and then choose few loci with non-zero effects obtained with UWLASSO. In this study, the iteratively weighted step is used in multiple regression analysis for few loci with non-zero effects to accuracy evaluate residual variance. To enhance shrinking efficiency, actually, the LASSO used in this study needs to be optimized by using cross-validation. Cross-validation of LASSO can obtain fewer loci with non-zero effects, but it will consume more computing time than iteratively weighted step for LASSO. Our simulation shows that those loci with non-zero effects solved from unweighted LASSO completely cover those from cross-validation of LASSO. Therefore, the cross-validation of LASSO is an option in our mapping method. Without the iteratively weighted step and cross-validation for LASSO, our proposed mapping method is faster than Xu’s IRLS. The program that implemented the mapping method is written in R language, which is freely available upon request from authors.

Only one kind of genetic effect is considered in our model for the simplification of description, which is suitable for backcross (BC), DH and recombinant inbreeding lines (RIL). Our method can be easily applied to analyze multiple genetic effects at each locus in other genetically designed population, such as intercross and outbred populations. The extension of continuous quantitative traits to discrete traits is being studied in our lab. It should be noted that the iteratively weighted step is not straightforward to incorporate in the LASSO for generalized linear model. In comparison, the extension of main effect model to epistatic model is not hard to map multiple interacting QTLs.

Key Points

  • We developed the iteratively reweighted least absolute shrinkage and selection operator (IRLASSO) for extending IRLS to simultaneously map multiple QTLs.

  • The LASSO with coordinate descent step is employed to efficiently estimate non-zero genetic effect of each locus scanned over entire genome.

  • The method can, not only estimate QTL effects, but also test the significance of QTLs, like regular interval mapping.

  • Without the iteratively weighted step and cross-validation for LASSO, our proposed mapping method is faster than Xu’s IRLS [4 – 6].

FUNDING

The National Natural Science Foundations of China (grants 30972077, 31172190 to R.Y. and 31201989 to Y.L.); the Applied National Science and Technology Support Program in China (grant 2012BAD26B01 to Y.L.).

References

1
Haley
CS
Knott
SA
A simple regression method for mapping quantitative trait loci in line crosses using flanking markers
Heredity
 , 
1992
, vol. 
69
 (pg. 
315
-
24
)
2
Dempster
AP
Laird
NM
Rubin
DB
Maximum likelihood from incomplete data via the EM algorithm
J R Stat Soc Ser B
 , 
1977
, vol. 
39
 (pg. 
1
-
38
)
3
Lander
ES
Botstein
D
Mapping mendelian factors underlying quantitative traits using RFLP linkage maps
Genetics
 , 
1989
, vol. 
121
 (pg. 
185
-
99
)
4
Xu
S
A comment on the simple regression method for interval mapping
Genetics
 , 
1995
, vol. 
141
 (pg. 
1657
-
9
)
5
Xu
S
Iteratively reweighted least squares mapping of quantitative trait loci
Behav Genet
 , 
1998
, vol. 
28
 (pg. 
341
-
55
)
6
Xu
S
Further investigation on the regression method of mapping quantitative trait loci
Heredity
 , 
1998
, vol. 
80
 (pg. 
364
-
73
)
7
Han
L
Xu
S
A Fisher scoring algorithm for the weighted regression method of QTL mapping
Heredity
 , 
2008
, vol. 
101
 (pg. 
453
-
64
)
8
Xu
S
Hu
Z
Generalized linear model for interval mapping of quantitative trait loci
Theoret App Genet
 , 
2010
, vol. 
121
 (pg. 
47
-
63
)
9
Jansen
RC
Stam
P
High resolution of quantitative traits into multiple loci via interval mapping
Genetics
 , 
1994
, vol. 
136
 (pg. 
1447
-
55
)
10
Zeng
ZB
Precision mapping of quantitative trait loci
Genetics
 , 
1994
, vol. 
136
 (pg. 
1457
-
68
)
11
Kao
CH
Zeng
ZB
Teasdale
RD
Multiple interval mapping for quantitative trait loci
Genetics
 , 
1999
, vol. 
152
 (pg. 
1203
-
16
)
12
Carlborg
Ö
Andersson
L
Kinghorn
B
The use of a genetic algorithm for simultaneous mapping of multiple interacting quantitative trait loci
Genetics
 , 
2000
, vol. 
155
 (pg. 
2003
-
10
)
13
Reifsnyder
PC
Churchill
G
Leiter
EH
Maternal environment and genotype interact to establish diabesity in mice
Genome Res
 , 
2000
, vol. 
10
 (pg. 
1568
-
78
)
14
Bogdan
M
Ghosh
JK
Doerge
RW
Modifying the Schwarz Bayesian information criterion to locate multiple interacting quantitative trait loci
Genetics
 , 
2004
, vol. 
167
 (pg. 
989
-
99
)
15
Wang
H
Zhang
YM
Li
X
, et al.  . 
Bayesian shrinkage estimation of quantitative trait loci parameters
Genetics
 , 
2005
, vol. 
170
 (pg. 
465
-
80
)
16
Yi
N
Xu
S
Bayesian LASSO for quantitative trait loci mapping
Genetics
 , 
2008
, vol. 
179
 (pg. 
1045
-
55
)
17
Satagopan
JM
Yandell
BS
Newton
MA
, et al.  . 
A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo
Genetics
 , 
1996
, vol. 
144
 (pg. 
805
-
16
)
18
Hoeschele
I
Mapping Quantitative Trait Loci in Outbred Pedigrees
 , 
2001
New York
Wiley
19
Sen
S
Churchill
GA
A statistical framework for quantitative trait mapping
Genetics
 , 
2001
, vol. 
159
 (pg. 
371
-
87
)
20
Yi
N
Yandell
BS
Churchill
GA
, et al.  . 
Bayesian model selection for genome-wide epistatic quantitative trait loci analysis
Genetics
 , 
2005
, vol. 
170
 (pg. 
1333
-
44
)
21
Yi
N
Banerjee
S
Pomp
D
, et al.  . 
Bayesian mapping of genomewide interacting quantitative trait loci for ordinal traits
Genetics
 , 
2007
, vol. 
176
 (pg. 
1855
-
64
)
22
Li
J
Das
K
Fu
G
, et al.  . 
The Bayesian lasso for genome-wide association studies
Bioinformatics
 , 
2011
, vol. 
27
 (pg. 
516
-
523
)
23
Che
X
Xu
S
Significance test and genome selection in Bayesian shrinkage analysis
Int J Plant Genomics
 , 
2010
, vol. 
2010
 pg. 
893206
 
24
Xu
S
Derivation of the shrinkage estimates of quantitative trait locus effects
Genetics
 , 
2007
, vol. 
177
 (pg. 
1255
-
8
)
25
Xu
S
Hu
Z
Methods of plant breeding in the genome era
Genet Res
 , 
2010
, vol. 
92
 (pg. 
423
-
41
)
26
Fang
M
Jiang
D
Li
D
, et al.  . 
Improved LASSO priors for shrinkage quantitative trait loci mapping
Theor Appl Genet
 , 
2012
, vol. 
124
 (pg. 
1315
-
24
)
27
Yang
R
Xu
S
Bayesian shrinkage analysis of quantitative trait loci for dynamic traits
Genetics
 , 
2007
, vol. 
176
 (pg. 
1169
-
85
)
28
Cai
X
Huang
A
Xu
S
Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping
BMC Bioinformatics
 , 
2011
, vol. 
12
 pg. 
211
 
29
Xu
S
An expectation-maximization algorithm for the Lasso estimation of quantitative trait locus effects
Heredity
 , 
2010
, vol. 
105
 (pg. 
483
-
94
)
30
Cai
X
Huang
A
Xu
S
Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping
BMC Bioinformatics
 , 
2011
, vol. 
12
 pg. 
211
 
31
Tibshirani
R
Regression shrinkage and selection via the Lasso
J R Statist Soc B
 , 
1996
, vol. 
58
 (pg. 
267
-
88
)
32
Friedman
J
Hastie
T
Tibshirani
R
Regularization paths for generalized linear models via coordinate descent
J Stat Softw
 , 
2010
, vol. 
33
 (pg. 
1
-
22
)
33
Yuan
M
Lin
Y
Efficient empirical Bayes variable selection and estimation in linear models
J Am Stat Assoc
 , 
2005
, vol. 
100
 (pg. 
1215
-
25
)
34
Mutshinda
CM
Sillanpää
MJ
Extended Bayesian LASSO for multiple quantitative trait loci mapping and unobserved phenotype prediction
Genetics
 , 
2010
, vol. 
186
 (pg. 
1067
-
75
)
35
Yuan
M
Lin
Y
Model selection and estimation in regression with grouped variables
J Royal Stat Soc: Ser B Stat Methodol
 , 
2006
, vol. 
68
 (pg. 
49
-
67
)
36
Donoho
DL
Johnstone
IM
Ideal spatial adaptation by wavelet shrinkage
Biometrika
 , 
1994
, vol. 
81
 (pg. 
425
-
55
)
37
Friedman
J
Hastie
T
Hoefling
H
, et al.  . 
Pathwise coordinate optimization
Ann Appl Stat
 , 
2007
, vol. 
2
 (pg. 
302
-
32
)
38
Churchill
GA
Doerge
RW
Empirical threshold values for quantitative trait mapping
Genetics
 , 
1994
, vol. 
138
 (pg. 
963
-
71
)

APPENDIX

The expectation and variance of indicator variable.

For genetically designed populations with two genotypes at each locus, such as BC, DH and RIL, only additive effect is considered in genetic model. Define forumla and let P1 and P2 be the conditional probabilities for the two genotypes given flanking marker information. The conditional expectation of indicator variable is  

formula
and its variance is  
formula

For F2 population, there are three genotypes, denoted by QQ, Qq and qq at each locus. Two genetic effects including additive and dominant effects are considered in genetic model. Define  

formula
as indicator variables of additive and dominant effects, respectively. Let P1, P2 and P3 are the conditional probabilities for the genotypes QQ, Qq and qq, given flanking marker information. The conditional expectations of indicator variables are  
formula
and their variance-covariance are  
formula
and Var(z2) = P2(1−P2).

The covariance of indicator variables between additive and dominant effects is  

formula

Author notes

*These authors have contributed equally to this work.