## Abstract

The iteratively reweighted least square (IRLS) method is mostly identical to maximum likelihood (ML) method in terms of parameter estimation and power of quantitative trait locus (QTL) detection. But the IRLS is greatly superior to ML in terms of computing speed and the robustness of parameter estimation. In conjunction with the priors of parameters, ML can analyze multiple QTL model based on Bayesian theory, whereas under a single QTL model, IRLS has very limited statistical power to detect multiple QTLs. In this study, we proposed the iteratively reweighted least absolute shrinkage and selection operator (IRLASSO) for extending IRLS to simultaneously map multiple QTLs. The LASSO with coordinate descent step is employed to efficiently estimate non-zero genetic effect of each locus scanned over entire genome. Simulations demonstrate that IRLASSO has a higher precision of parameter estimation and power to detect QTL than IRLS, and is able to estimate residual variance more accurately than the unweighted LASSO based on LS. Especially, IRLASSO is very fast, usually taking less than five iterations to converge. The barley dataset from the North American Barley Genome Mapping Project is reanalyzed by our proposed method.

## INTRODUCTION

In earlier quantitative trait loci (QTL) mapping based on a single QTL model, least square (LS) method developed by Haley and Knott [1] was generally considered to be simpler and more robust than maximum likelihood method [2, 3]. However, LS method would result in the inflation of residual variance after replacing QTL genotype indicator by its expectation conditional on the flanking markers [1]. By quantifying the inflated residual variance, iteratively reweighted least square (IRLS) was therefore proposed by Xu [4–6]. Simulations demonstrated that for normal traits, the performance of IRLS is almost identical to that of maximum likelihood (ML), but the computational speed of IRLS is many times faster than that of ML [5]. In addition, IRLS inherits the robustness of LS. The estimates of IRLS are essentially suboptimal when the weighted value was treated as a constant. As a result, Han and Xu [7] introduced the Fisher scoring algorithm for finding the maximum likelihood estimates of QTL parameters. Within the framework of generalized linear model, subsequently, the IRLS was extended to analyzing discrete traits, such as binary, ordinal and Poisson traits, by taking into account the heterogeneity of individual residual variances [8]. Meanwhile, this method is more appropriate to identify large QTL that locates in large maker intervals than the ML [7, 8].

As most of quantitative traits are controlled by multiple genetic and environmental factors, aforementioned interval mapping methods are not optimal without considering other linked QTLs. Thus, some mapping methodologies based on multiple QTL model were developed to improve the power of QTL detection. Theoretically, the multiple QTL mapping can be regarded as the statistical question of model search and selection, for either non-Bayesian model selection methods [9–16] or Bayesian model selection methods [17–22]. Compared with interval mapping, these advanced statistical methods tend to be more sensitive to the experimental designs, the data structures, the initial values of the parameters and other factors such as assumptions of the models. Specifically, the results of Bayesian mapping are hard to be interpreted due to lack of significance test. Although the Bayes factor was used to infer the significance of QTL in Bayesian model selection mapping by Yi *et al.* [20, 21], its distribution also depends on population structure and marker density. Che and Xu [23] have proposed significance test in Bayesian shrinkage analysis by using permutation test, but computation burden of the approach is heavy.

Currently, the Bayesian shrinkage methods increasingly become a promising and active research area of QTL mapping and genomic selection [15, 24–28], in which, all estimated effects are forced to shrink toward zero. The fully Bayesian shrinkage has heavy computing burden and thus has not been widely applied to mapping QTL. In contrast, empirical Bayes methods [24, 29, 30] can produce results similar to those of the fully Bayesian approaches but could be quickly computed. LASSO is another shrinkage method that has been widely used in regression analysis for large models [31]. It can solve large equations extremely fast using coordinate descent algorithm [32]. If a double-exponential prior is assigned to regression coefficient, the Bayesian estimates are equivalent to the LASSO estimates [33]. Many improved LASSO [16, 26, 29, 34] have been developed for QTL mapping. By choosing various priors for QTL effects, those mapping methods may enhance shrinking efficiency to a certain extent, but their computing speeds are far inferior to regular LASSO with a double-exponential prior [31]. Moreover, these studies only analyze known markers and do not provide an efficient and convincing method for statistically inferring QTL.

In this study, we extend Xu’s IRLS to simultaneously mapping multiple QTLs. Fast LASSO [32, 35] is employed to estimate genetic effect of each locus scanned over entire genome. This mapping method is known as iteratively reweighted LASSO. It can, not only estimate QTL effects, but also test the significance of QTLs, like regular interval mapping. The performance of our proposed method is demonstrated via computer simulation. Finally, we apply this method to reanalyzing the barley dataset from the North American Barley Genome Mapping Project.

## METHOD

### Genetic model

Our mapping method is developed for improving linkage analysis. It requires that on *n* individuals, phenotypes are observed and *m* co-dominant markers with known linkage map are genotyped in a genetically designed population. Prior to mapping QTL, all positions on whole genome may be candidates of QTLs. The difference among present mapping methods lies mainly in the way to search QTLs on the genome. Traditional interval mapping identifies QTL by genome-wide scanning, whereas Bayesian mapping by randomly sampling QTL positions. For mapping multiple QTLs, we partition the entire genome into *k* loci that include not only the marker positions but also points between markers. The points between markers are equally spaced by the length of scanning step or tuning value used in sampling QTL position. The linear model of phenotypic value (*y*_{i}, for *i* = 1,2, … ,*n*) on QTL effects of *k* loci is then expressed as

_{j}is the

*j*th systematic environmental effect that is irrelevant to QTL effects,

*x*

_{ij}is the corresponding design matrix on the

*i*th individual, γ

_{j}is the

*j*th locus genetic effect,

*z*

_{ij}is the corresponding incidence matrix determined by the genotypes of locus

*k*, ε

_{i}is the residual error, which is assumed to be a normal distribution with σ

^{2}being residual variance.

In the Equation (1), *z*_{ij} is missing value unless the locus is just a fully informative marker. Following LS method by Haley and Knott [1], we can simply estimate unknown *z*_{ij} by the expectation of *z*_{ij} conditional on flanking markers genotype. Thus, Equation (1) becomes

*u*

_{ij}=

*E*(

*z*

_{ij}) is the conditional expectation of

*z*

_{ij}, which depends on the genetically designed population (see the Appendix for details).

Considering the heterogeneity of residual variance after replacing *z*_{ij} by its conditional expectation *u*_{ij}, we exactly write Equation (1) by

*z*

_{ij}with the residual error ε

_{i}. Then, Equation (2) can be rewritten as

Define and . In the notation of matrix, the Equation (5) is described as

and its variance is with for*j*and

*j*′=1, 2 ⋯,

*k*being the covariance matrix of indicator variances. Different from what Xu [6] defined earlier, Σ

_{i}is derived from multiple QTLs and considers the linkages among multiple QTLs,where

*r*

_{jj′}is recombination coefficient between loci

*j*and

*j*′,

*r*

_{jj′}= 0 if

*j*=

*j*′. and are defined as conditional variances at loci

*j*and

*j*′ for the

*i*th individual, which have explicit form:

The equation has different form due to genetically designed population, as shown in the Appendix. If these loci overlap with completely informative markers, under the assumption of no segregation distortion.

### Parameter estimation

The *s* in Equation (5) can be relatively small, especially in genetically designed population with a single family, but *k* may vary large and sometimes may be even larger than the sample size *n*. Thus, we decide to adopt a modified LASSO to estimate the effects of these loci.

Transform residual variance into with being weighted value, and let , and . By minimizing , the regression coefficients in Equation (5) can be solved using extremely fast LASSO with a coordinate descent step [35]:to partially optimize with respect to γ_{l}(*l* = 1,2,…,*k*), the gradient at is computed by

The equation is required to treat separately by , or . Simple calculus for the coordinate-wise update [36, 37] is implemented in the following way

where, is the fitted value excluding the genetic effect for the*l*th loci, and denoted by

*z*is the simple least-squares coefficient when fitting this partial residual to the

*l*th loci.

*S*(

*z*,λ) is the soft-thresholding operator with value

As a tuning parameter, λ can be chosen by cross-validation for LASSO. In our computation, however, path-wise coordinate descent proposed by Friedman *et al.* [32] is adopted to choose lambda efficiently.

Because *w*_{i} is a function of estimated parameters, iterations are required. The iteration process is recommended as follow:

Initialize

*w*_{i}= 1Estimate β

_{j}, γ_{j}and σ^{2}using unweighted LASSOUpdate

*w*_{i}using σ^{2}and non-zero β_{j}and γ_{j}Estimate β

_{j}and γ_{j}using the weighted LASSORepeat step 3 and 4 until ≤10

^{−6}or 10^{−8}, where*t*is iterating time.

Herein, we call the iteration process as iteratively reweighted LASSO.

### Significance test

A subtset of loci with non-zero genetic effects are retained after the shrinkage of IRLASSO. Once the number of non-zero effects is greater than the sample size, we can drop those linked loci by assuming that there is only one QTL within a marker interval. For the retained loci with non-zero effects, the iteratively weighted least square estimates of the parameters are

where .The variance–covariance matrix of the estimated parameter is calculated as

From Equations (10), (11) and (12), we can form *t*-test statistic to test the significance of the retained loci with non-zero effects.

Considering that *t*-test statistic is influenced by population structure and marker density, we use the critical threshold generated by permutation tests [38]. By repeatedly shuffling the relationships between marker genotypes and phenotypes, a series of the maximum *t**-*statistic values are calculated, from the distribution of which, the critical threshold is obtained.

### Simulations

Simulations are conducted to investigate statistical behavior of our proposed method. In a backcross population, we simulate six chromosomes, each with a length of 100 cM and evenly placed co-dominant markers. The 2, 2, 1, 2, 1 and 2 QTLs are respectively put on six chromosomes, whose positions and genetic effects are listed in Tables 1 and 2. To access the influence of sample size and marker density on mapping results, sample size is designed as the three levels: 150, 200 and 300. For sample sizes of 150 and 300, 11 markers are distributed on each chromosome. For sample sizes of 200, there are 21 markers on each of the first 2 chromosomes, 11 on the middle 2 chromosomes and 5 on the rest two chromosomes. Only one QTL is put on each chromosome, whose genetic effects are classified into two levels of 2.5 and 0.8 on the chromosome with same marker density.

Sample size | Method | QTL no. | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||

True | 23 | 56 | 148 | 193 | 267 | 332 | 390 | 478 | 522 | 574 | |

150 | IRLASSO | 25.1 (2.6) | 58.5 (2.9) | 151.5 (3.7) | 204.3 (12.3) | 264.8 (8.0) | 335.7 (4.3) | 403.9 (13.5) | 474.8 (11.5) | 524.9 (5.3) | 580.5 (4.6) |

UWLASSO | 25.0 (2.8) | 58.4 (2.9) | 151.2 (4.0) | 204.8 (12.5) | 264.2 (8.1) | 335.6 (4.4) | 403.9 (13.8) | 474.2 (11.7) | 524.5 (5.6) | 580.5 (4.7) | |

IRLS | 44.6 (8.3) | 48.3 (9.3) | 166.1 (13.1) | 186.7 (10.8) | 267.1 (20.1) | 332.4 (15.3) | 386.7 (15.2) | 480.3 (8.3) | 533.5 (7.5) | 580.1 (13.8) | |

300 | IRLASSO | 25.4 (1.9) | 58.8 (2.1) | 151.6 (3.0) | 196.6 (2.3) | 269.5 (5.0) | 337.1 (2.9) | 395.3 (3.2) | 482.4 (2.1) | 526.2 (4.7) | 580.6 (3.1) |

UWLASSO | 25.3 (2.2) | 58.5 (2.6) | 151.4 (3.2) | 196.7 (2.3) | 269.4 (5.0) | 336.9 (3.3) | 395.0 (3.4) | 482.2 (2.5) | 525.8 (5.2) | 580.3 (3.4) | |

IRLS | 45.0 (7.7) | 48.2 (8.8) | 169.1 (11.2) | 188.8 (9.2) | 265.9 (20.3) | 332.4 (13.7) | 388.8 (13.8) | 481.2 (5.8) | 537.8 (17.1) | 582.3 (10.6) |

Sample size | Method | QTL no. | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||

True | 23 | 56 | 148 | 193 | 267 | 332 | 390 | 478 | 522 | 574 | |

150 | IRLASSO | 25.1 (2.6) | 58.5 (2.9) | 151.5 (3.7) | 204.3 (12.3) | 264.8 (8.0) | 335.7 (4.3) | 403.9 (13.5) | 474.8 (11.5) | 524.9 (5.3) | 580.5 (4.6) |

UWLASSO | 25.0 (2.8) | 58.4 (2.9) | 151.2 (4.0) | 204.8 (12.5) | 264.2 (8.1) | 335.6 (4.4) | 403.9 (13.8) | 474.2 (11.7) | 524.5 (5.6) | 580.5 (4.7) | |

IRLS | 44.6 (8.3) | 48.3 (9.3) | 166.1 (13.1) | 186.7 (10.8) | 267.1 (20.1) | 332.4 (15.3) | 386.7 (15.2) | 480.3 (8.3) | 533.5 (7.5) | 580.1 (13.8) | |

300 | IRLASSO | 25.4 (1.9) | 58.8 (2.1) | 151.6 (3.0) | 196.6 (2.3) | 269.5 (5.0) | 337.1 (2.9) | 395.3 (3.2) | 482.4 (2.1) | 526.2 (4.7) | 580.6 (3.1) |

UWLASSO | 25.3 (2.2) | 58.5 (2.6) | 151.4 (3.2) | 196.7 (2.3) | 269.4 (5.0) | 336.9 (3.3) | 395.0 (3.4) | 482.2 (2.5) | 525.8 (5.2) | 580.3 (3.4) | |

IRLS | 45.0 (7.7) | 48.2 (8.8) | 169.1 (11.2) | 188.8 (9.2) | 265.9 (20.3) | 332.4 (13.7) | 388.8 (13.8) | 481.2 (5.8) | 537.8 (17.1) | 582.3 (10.6) |

Sample size | Method | QTL no. | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||

True effect | 1.50 | 2.00 | 0.72 | 1.10 | −0.22 | 0.70 | −0.65 | 1.25 | 0.35 | −0.80 | |

150 | IRLASSO | 1.52 (0.19) | 1.98 (0.18) | 0.91 (0.28) | 0.74 (0.54) | −0.21 (0.17) | 0.61 (0.19) | −0.40 (0.41) | 1.12 (0.24) | 0.35 (0.16) | −0.77 (0.15) |

UWLASSO | 1.52 (0.19) | 1.98 (0.19) | 0.91 (0.28) | 0.71 (0.56) | −0.21 (0.18) | 0.61 (0.20) | −0.39 (0.42) | 1.10 (0.26) | 0.34 (0.16) | −0.77 (0.16) | |

IRLS | 2.92 (0.22) | 2.95 (0.22) | 1.43 (0.29) | 1.52 (0.30) | −0.28 (0.48) | 0.50 (0.45) | −0.34 (0.52) | 1.29 (0.28) | −0.08 (0.52) | −0.72 (0.39) | |

300 | IRLASSO | 1.55 (0.11) | 1.95 (0.12) | 0.72 (0.09) | 1.09 (0.10) | −0.23 (0.07) | 0.70 (0.10) | −0.65 (0.10) | 1.24 (0.07) | 0.35 (0.10) | −0.79 (0.10) |

UWLASSO | 1.55 (0.14) | 1.94 (0.14) | 0.71 (0.11) | 1.09 (0.11) | −0.22 (0.09) | 0.69 (0.11) | −0.65 (0.12) | 1.24 (0.08) | 0.35 (0.11) | −0.78 (0.12) | |

IRLS | 2.88 (0.18) | 2.90 (0.18) | 1.35 (0.22) | 1.46 (0.21) | −0.26 (0.32) | 0.49 (0.33) | −0.39 (0.40) | 1.27 (0.21) | −0.15 (0.42) | −0.72 (0.26) |

Sample size | Method | QTL no. | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||

True effect | 1.50 | 2.00 | 0.72 | 1.10 | −0.22 | 0.70 | −0.65 | 1.25 | 0.35 | −0.80 | |

150 | IRLASSO | 1.52 (0.19) | 1.98 (0.18) | 0.91 (0.28) | 0.74 (0.54) | −0.21 (0.17) | 0.61 (0.19) | −0.40 (0.41) | 1.12 (0.24) | 0.35 (0.16) | −0.77 (0.15) |

UWLASSO | 1.52 (0.19) | 1.98 (0.19) | 0.91 (0.28) | 0.71 (0.56) | −0.21 (0.18) | 0.61 (0.20) | −0.39 (0.42) | 1.10 (0.26) | 0.34 (0.16) | −0.77 (0.16) | |

IRLS | 2.92 (0.22) | 2.95 (0.22) | 1.43 (0.29) | 1.52 (0.30) | −0.28 (0.48) | 0.50 (0.45) | −0.34 (0.52) | 1.29 (0.28) | −0.08 (0.52) | −0.72 (0.39) | |

300 | IRLASSO | 1.55 (0.11) | 1.95 (0.12) | 0.72 (0.09) | 1.09 (0.10) | −0.23 (0.07) | 0.70 (0.10) | −0.65 (0.10) | 1.24 (0.07) | 0.35 (0.10) | −0.79 (0.10) |

UWLASSO | 1.55 (0.14) | 1.94 (0.14) | 0.71 (0.11) | 1.09 (0.11) | −0.22 (0.09) | 0.69 (0.11) | −0.65 (0.12) | 1.24 (0.08) | 0.35 (0.11) | −0.78 (0.12) | |

IRLS | 2.88 (0.18) | 2.90 (0.18) | 1.35 (0.22) | 1.46 (0.21) | −0.26 (0.32) | 0.49 (0.33) | −0.39 (0.40) | 1.27 (0.21) | −0.15 (0.42) | −0.72 (0.26) |

The simulated datasets are analyzed by using IRLASSO, unweighted LASSO (UWLASSO) and IRLS. The critical values of the test statistic for declaring statistical significance may be different due to different methods and situations. Under the null model with zero genetic effects, we simulate 500 additional samples to determine the critical values (Table 3). The 400 replicated simulations are used to estimate QTL parameters and to access the statistical power of QTL detection. Statistical power of QTL detection is calculated by each locus as the percentage of the number of those simulations that statistic value exceeds critical value at the locus. Also, false positive rate is assessed with the 400 replicated simulations under the null model with zero genetic effects.

Method | Sample size | ||
---|---|---|---|

150 | 200 | 300 | |

IRLASSO | 2.39 | 2.39 | 2.54 |

UWLASSO | 2.38 | 2.36 | 2.55 |

IRLS | 2.90 | 2.83 | 2.98 |

Method | Sample size | ||
---|---|---|---|

150 | 200 | 300 | |

IRLASSO | 2.39 | 2.39 | 2.54 |

UWLASSO | 2.38 | 2.36 | 2.55 |

IRLS | 2.90 | 2.83 | 2.98 |

For sample sizes of 150 and 300, the statistical powers to detect QTLs with three mapping methods are given in Table 4, along with false positive rate and exactly identifying proportion for all true QTLs. As can be seen, although the statistical powers at each locus between IRLASSO and UWLASSO are similar, statistical powers of IRLS are greatly lower than those of IRLASSO and UWLASSO at those loci with smaller genetic effects. False positive rates are <10% for the two sample sizes and three mapping methods, but IRLASSO performs lower false positive rates than the others. Exactly identifying proportions for all true QTLs are generally low due to the difficulty to detect the fifth and ninth QTLs with low genetic effects, although IRLASSO is capable of identifying all true QTLs simultaneously with higher statistical power than UWLASSO and IRLS. The two LASSO methods can estimate QTL effects more accurately than IRLS (Table 2). Generally, the statistical performance of each mapping method increases as sample size and genetic effect increase. Compared with UWLASSO, especially, the IRLASSO is able to precisely evaluate residual variance. With IRLASSO and UWLASSO, the estimates (standard deviations) are 1.09 (0.21) and 1.88 (0.24), respectively, for sample size of 150, whereas 1.07 (0.17) and 1.91 (0.20), respectively, for sample size of 300.

Sample size | Method | Statistical power of each QTL | FPR | EIP | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||||

150 | IRLASSO | 100 | 100 | 96.5 | 100 | 13.5 | 98 | 96.3 | 100 | 41 | 96 | 4.8 | 12.3 |

UWLASSO | 100 | 100 | 94 | 100 | 13.3 | 94.5 | 93.5 | 98.5 | 39 | 96.3 | 5.5 | 12 | |

IRLS | 100 | 100 | 88 | 95 | 0.5 | 8 | 2.5 | 79 | 1.3 | 24 | 9 | 0 | |

300 | IRLASSO | 100 | 100 | 100 | 100 | 31.5 | 100 | 99 | 100 | 72.5 | 100 | 3 | 31.3 |

UWLASSO | 100 | 100 | 100 | 100 | 31.3 | 99.5 | 98.5 | 100 | 71 | 99 | 5 | 31 | |

IRLS | 100 | 100 | 99 | 100 | 1.5 | 15 | 12 | 98 | 3.5 | 42 | 6.3 | 0.3 |

Sample size | Method | Statistical power of each QTL | FPR | EIP | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||||

150 | IRLASSO | 100 | 100 | 96.5 | 100 | 13.5 | 98 | 96.3 | 100 | 41 | 96 | 4.8 | 12.3 |

UWLASSO | 100 | 100 | 94 | 100 | 13.3 | 94.5 | 93.5 | 98.5 | 39 | 96.3 | 5.5 | 12 | |

IRLS | 100 | 100 | 88 | 95 | 0.5 | 8 | 2.5 | 79 | 1.3 | 24 | 9 | 0 | |

300 | IRLASSO | 100 | 100 | 100 | 100 | 31.5 | 100 | 99 | 100 | 72.5 | 100 | 3 | 31.3 |

UWLASSO | 100 | 100 | 100 | 100 | 31.3 | 99.5 | 98.5 | 100 | 71 | 99 | 5 | 31 | |

IRLS | 100 | 100 | 99 | 100 | 1.5 | 15 | 12 | 98 | 3.5 | 42 | 6.3 | 0.3 |

FPR, false positive rate; EIP, exactly identifying proportion.

Table 5 shows the statistical powers of QTL detection and estimates of QTL parameters obtained with three mapping methods for sample size of 200. No influence of sparse markers on convergence to the correct solutions has been found. As expected, the statistical power and precision of parameter estimation increase with the increased marker density and QTL effect. The residual variances (standard deviations) are estimated as 1.28 (0.23) and 2.52 (0.29) by IRLASSO and UWLASSO methods, respectively, again demonstrating the capability of more accuracy estimating residual variance by IRLASSO than UWLASSO method.

Method | QTL no. | ||||||
---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | ||

True | Position | 56 | 157 | 258 | 359 | 460 | 561 |

Effect | 2.5 | 0.8 | 2.5 | 0.8 | 2.5 | 0.8 | |

IRLASSO | Power | 100 | 99 | 100 | 99.3 | 100 | 95.3 |

Position | 58.09 (2.98) | 461.81 (4.90) | 260.96 (2.47) | 361.74 (4.09) | 159.34 (2.18) | 563.60 (4.72) | |

Effect | 2.47 (0.13) | 0.80 (0.11) | 2.47 (0.11) | 0.79 (0.10) | 2.49 (0.11) | 0.80 (0.11) | |

UWLASSO | Power | 100 | 98 | 100 | 99 | 100 | 98.5 |

Position | 57.83 (3.23) | 462.10 (5.19) | 260.91 (2.42) | 361.56 (4.05) | 158.80 (3.18) | 563.45 (5.03) | |

Effect | 2.46 (0.17) | 0.81 (0.13) | 2.46 (0.12) | 0.80 (0.13) | 2.47 (0.16) | 0.80 (0.13) | |

IRLS | Power | 100 | 21 | 100 | 29 | 100 | 18.5 |

Position | 57.08 (2.35) | 458.64 (15.71) | 259.47 (3.91) | 358.20 (14.46) | 157.94 (3.04) | 560.17 (13.51) | |

Effect | 2.54 (0.27) | 0.88 (0.40) | 2.49 (0.30) | 0.89 (0.39) | 2.55 (0.26) | 0.91 (0.33) |

Method | QTL no. | ||||||
---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | ||

True | Position | 56 | 157 | 258 | 359 | 460 | 561 |

Effect | 2.5 | 0.8 | 2.5 | 0.8 | 2.5 | 0.8 | |

IRLASSO | Power | 100 | 99 | 100 | 99.3 | 100 | 95.3 |

Position | 58.09 (2.98) | 461.81 (4.90) | 260.96 (2.47) | 361.74 (4.09) | 159.34 (2.18) | 563.60 (4.72) | |

Effect | 2.47 (0.13) | 0.80 (0.11) | 2.47 (0.11) | 0.79 (0.10) | 2.49 (0.11) | 0.80 (0.11) | |

UWLASSO | Power | 100 | 98 | 100 | 99 | 100 | 98.5 |

Position | 57.83 (3.23) | 462.10 (5.19) | 260.91 (2.42) | 361.56 (4.05) | 158.80 (3.18) | 563.45 (5.03) | |

Effect | 2.46 (0.17) | 0.81 (0.13) | 2.46 (0.12) | 0.80 (0.13) | 2.47 (0.16) | 0.80 (0.13) | |

IRLS | Power | 100 | 21 | 100 | 29 | 100 | 18.5 |

Position | 57.08 (2.35) | 458.64 (15.71) | 259.47 (3.91) | 358.20 (14.46) | 157.94 (3.04) | 560.17 (13.51) | |

Effect | 2.54 (0.27) | 0.88 (0.40) | 2.49 (0.30) | 0.89 (0.39) | 2.55 (0.26) | 0.91 (0.33) |

### Examples

The barley dataset from the North American Barley Genome Mapping Project (http://wheat.pw.usda.gov/ggpages/SxM/) is used to demonstrate our proposed methods. This mapping population consists of 150 doubled haploids (DHs) derived from the cross Steptoe × Morex. A total of eight traits are observed under multiple different environments and 223 markers covering a genome of length ∼1500 cM are genotyped for each DH line.

Based on the above dataset, we adopt IRLASSO, UWLASSO and IRLS methods, respectively, to locate QTLs for five traits, including alpha amylase (20° U), grain protein (%), grain yield (MT/ha), height (cm) and lodging (%) in barley. The phenotypic values of these traits are measured by using the averages of observations across multiple environments.

For convenience to compare three mapping methods, we transform statistic *t* to –log(*P*), where *P* is the probability of greater than statistic value *t*. The genome-wide empirical critical thresholds for five traits are obtained by using 1000 permutation tests. Under the same mapping method, obviously, the critical thresholds are very near among the traits analyzed, as pointed by horizontal reference line in Figure 1. Figure 1 plots the profiles of –log(*P*) statistics over the genome for the five traits obtained with IRLASSO method. As shown, there are some peaks exceeding corresponding critical thresholds for all traits analyzed, providing the evidence for existence of QTLs controlling alpha amylase, grain protein, grain yield, height and lodging. Most of the QTLs detected, distribute on chromosomes 3 (six QTLs), 2 (five QTLs) and 7 (five QTLs), in which, some QTLs perform the pleiotropy, for instance, the QTL on marker ‘ABG399’ simultaneously governing grain protein, grain yield, height and lodging. Specifically, many QTLs are identified on markers: three markers for alpha amylase, five markers for grain protein, two markers for grain yield, three markers for height and two markers for lodging. UWLASSO identifies the totally same QTLs as those detected with IRLASSO, but more QTLs than those detected with IRLS. Other three traits have also been analyzed, which support the advantage of our proposed method (Results not shown).

Table 6 tabulates parameter estimates of QTLs detected with three mapping methods. With IRLASSO, most of the QTLs are identified to be positive effects, suggesting that the alleles of parent ‘Steptoe’ contribute to increase in all traits except for malt extract. The proportions of phenotypic variation explained by the detected QTLs ranged from 2% to 54%. The largest heritability is 0.54 of the QTL for height on chromosome 2, followed by 0.44 of the QTL for grain yield on chromosome 3 and the smallest one is 0.02 of the QTL for height on chromosome 6. For the same trait, the residual variance estimated by UWLASSO is consistently greater than that by IRLASSO, so that the UWLASSO generally underestimates the heritabilities of QTLs detected.

Trait no. | QTL no. | IRLASSO | UWLASSO | IRLS | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Chr-pos. (cM) | Marker interval | Additive effect | Residual variance | Heritability | −log(P) | Additive effect | Residual variance | Heritability | Chr-pos. (cM) | ||

1 | 1 | 1–8 | Brz ∼ ABC156D | 1.20 | 1.69 | 0.22 | 6.71 | 1.23 | 2.02 | 0.22 | 1–68 |

2 | 1–88.6 | KFP194 | 1.00 | 0.16 | 5.63 | 0.99 | 0.14 | ||||

3 | 2–68 | MWG557 ∼ ABG316C | 1.02 | 0.16 | 6.81 | 1.00 | 0.15 | 2–76 | |||

4 | 4–17.1 | CDO669 | 0.72 | 0.08 | 4.37 | 0.70 | 0.07 | 4–10 | |||

5 | 7–101.8 | WG364 | 0.86 | 0.12 | 6.31 | 0.87 | 0.11 | 7–80 | |||

2 | 1 | 2–76 | ABC167B ∼ bBE54D | 0.38 | 0.09 | 0.27 | 11.36 | 0.38 | 0.11 | 0.26 | 2–68 |

2 | 3–16 | ABA307B ∼ ABC171 | −0.14 | 0.04 | 4.26 | −0.13 | 0.03 | ||||

3 | 3–52.6 | ABG399 | 0.18 | 0.06 | 5.45 | 0.17 | 0.05 | ||||

4 | 4–119.5 | ABG500B | 0.26 | 0.13 | 7.96 | 0.23 | 0.10 | 4–112 | |||

5 | 6–6.8 | MWG620 | 0.14 | 0.04 | 4.46 | 0.13 | 0.03 | ||||

6 | 6–93.5 | Nar7 | 0.16 | 0.05 | 4.36 | 0.16 | 0.05 | ||||

7 | 7–52 | Ubi2 ∼ ksuA3A | 0.33 | 0.20 | 14.33 | 0.35 | 0.22 | 7–56 | |||

8 | 7–137.8 | MWG514B | −0.17 | 0.05 | 4.60 | −0.17 | 0.05 | ||||

3 | 1 | 3–52.6 | ABG399 | −0.26 | 0.07 | 0.44 | 16.66 | −0.26 | 0.08 | 0.42 | 3–56 |

2 | 6–67.5 | CDO497 | 0.12 | 0.09 | 4.17 | 0.12 | 0.09 | ||||

4 | 1 | 2–40 | MWG858 ∼ ABG358 | 5.60 | 3.72 | 0.54 | 42.53 | 5.58 | 4.48 | 0.52 | 2–39.3 |

2 | 2–71.8 | ABC167B | −2.01 | 0.07 | 9.30 | −1.96 | 0.06 | ||||

3 | 3–29.8 | MWG584 | 1.25 | 0.03 | 5.01 | 1.19 | 0.02 | 3–56.1 | |||

4 | 3–56 | ABG399 ∼ BCD828 | 2.86 | 0.14 | 18.69 | 2.92 | 0.14 | ||||

5 | 6–6 | ABG062 ∼ MWG620 | 0.99 | 0.02 | 4.62 | 0.97 | 0.02 | ||||

6 | 7–80 | ABC302 ∼ CDO57B | 2.66 | 0.12 | 16.72 | 2.74 | 0.13 | 7–78.3 | |||

7 | 7–137.8 | MWG514B | −1.23 | 0.03 | 5.53 | −1.21 | 0.02 | ||||

5 | 1 | 1–122.8 | KFP190 | −3.79 | 39.17 | 0.09 | 4.15 | 3.21 | 44.87 | 0.08 | 1–108 |

2 | 2–68.2 | ABG316C | −4.36 | 0.13 | 6.76 | 3.36 | 0.09 | ||||

3 | 2–168 | ABG316E ∼ Pcr1 | 3.75 | 0.09 | 6.40 | 2.19 | 0.04 | 2–154 | |||

4 | 3–54 | ABG399 ∼ BCD828 | 8.09 | 0.43 | 13.43 | 7.33 | 0.43 | 3–56 |

Trait no. | QTL no. | IRLASSO | UWLASSO | IRLS | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Chr-pos. (cM) | Marker interval | Additive effect | Residual variance | Heritability | −log(P) | Additive effect | Residual variance | Heritability | Chr-pos. (cM) | ||

1 | 1 | 1–8 | Brz ∼ ABC156D | 1.20 | 1.69 | 0.22 | 6.71 | 1.23 | 2.02 | 0.22 | 1–68 |

2 | 1–88.6 | KFP194 | 1.00 | 0.16 | 5.63 | 0.99 | 0.14 | ||||

3 | 2–68 | MWG557 ∼ ABG316C | 1.02 | 0.16 | 6.81 | 1.00 | 0.15 | 2–76 | |||

4 | 4–17.1 | CDO669 | 0.72 | 0.08 | 4.37 | 0.70 | 0.07 | 4–10 | |||

5 | 7–101.8 | WG364 | 0.86 | 0.12 | 6.31 | 0.87 | 0.11 | 7–80 | |||

2 | 1 | 2–76 | ABC167B ∼ bBE54D | 0.38 | 0.09 | 0.27 | 11.36 | 0.38 | 0.11 | 0.26 | 2–68 |

2 | 3–16 | ABA307B ∼ ABC171 | −0.14 | 0.04 | 4.26 | −0.13 | 0.03 | ||||

3 | 3–52.6 | ABG399 | 0.18 | 0.06 | 5.45 | 0.17 | 0.05 | ||||

4 | 4–119.5 | ABG500B | 0.26 | 0.13 | 7.96 | 0.23 | 0.10 | 4–112 | |||

5 | 6–6.8 | MWG620 | 0.14 | 0.04 | 4.46 | 0.13 | 0.03 | ||||

6 | 6–93.5 | Nar7 | 0.16 | 0.05 | 4.36 | 0.16 | 0.05 | ||||

7 | 7–52 | Ubi2 ∼ ksuA3A | 0.33 | 0.20 | 14.33 | 0.35 | 0.22 | 7–56 | |||

8 | 7–137.8 | MWG514B | −0.17 | 0.05 | 4.60 | −0.17 | 0.05 | ||||

3 | 1 | 3–52.6 | ABG399 | −0.26 | 0.07 | 0.44 | 16.66 | −0.26 | 0.08 | 0.42 | 3–56 |

2 | 6–67.5 | CDO497 | 0.12 | 0.09 | 4.17 | 0.12 | 0.09 | ||||

4 | 1 | 2–40 | MWG858 ∼ ABG358 | 5.60 | 3.72 | 0.54 | 42.53 | 5.58 | 4.48 | 0.52 | 2–39.3 |

2 | 2–71.8 | ABC167B | −2.01 | 0.07 | 9.30 | −1.96 | 0.06 | ||||

3 | 3–29.8 | MWG584 | 1.25 | 0.03 | 5.01 | 1.19 | 0.02 | 3–56.1 | |||

4 | 3–56 | ABG399 ∼ BCD828 | 2.86 | 0.14 | 18.69 | 2.92 | 0.14 | ||||

5 | 6–6 | ABG062 ∼ MWG620 | 0.99 | 0.02 | 4.62 | 0.97 | 0.02 | ||||

6 | 7–80 | ABC302 ∼ CDO57B | 2.66 | 0.12 | 16.72 | 2.74 | 0.13 | 7–78.3 | |||

7 | 7–137.8 | MWG514B | −1.23 | 0.03 | 5.53 | −1.21 | 0.02 | ||||

5 | 1 | 1–122.8 | KFP190 | −3.79 | 39.17 | 0.09 | 4.15 | 3.21 | 44.87 | 0.08 | 1–108 |

2 | 2–68.2 | ABG316C | −4.36 | 0.13 | 6.76 | 3.36 | 0.09 | ||||

3 | 2–168 | ABG316E ∼ Pcr1 | 3.75 | 0.09 | 6.40 | 2.19 | 0.04 | 2–154 | |||

4 | 3–54 | ABG399 ∼ BCD828 | 8.09 | 0.43 | 13.43 | 7.33 | 0.43 | 3–56 |

## DISCUSSION

In this study, we extend Xu’s IRLS for a single QTL model to simultaneously analyzing multiple QTLs, greatly improving the power to detect QTL. So-called IRLASSO method inherits quick convergence of Xu’s IRLS, and uses extremely fast LASSO with coordinate descent step to estimate non-zero QTL effects. Thus, the mapping method can identify multiple QTLs in manner of high computing efficiency, as compared with existing LASSO for QTL mapping. Through multiple regression analysis for few loci with non-zero effects, the IRLASSO method is capable of statistically inferring the significance of QTL detected. Like regular interval mapping, this significance test includes construction of test statistic and permutation test for determining critical value. It should be clarified that if the marker density is sufficiently high, there is no need for the IRLS because LS, ML and IRLS, all would be the same. High dense marker maps are not always available for many species. Therefore, our proposed method remains useful as long as marker density is low or there are missing genotypes or both. Missing genotypes will always happen no matter how dense a marker map is.

Although the unweighted LASSO inflates the estimation of residual variance, as demonstrated in simulations, it is mostly consistent with IRLASSO method in terms of parameter estimation and power of QTL detection. For simplification of computation, we can omit the iteratively weighted step for LASSO and then choose few loci with non-zero effects obtained with UWLASSO. In this study, the iteratively weighted step is used in multiple regression analysis for few loci with non-zero effects to accuracy evaluate residual variance. To enhance shrinking efficiency, actually, the LASSO used in this study needs to be optimized by using cross-validation. Cross-validation of LASSO can obtain fewer loci with non-zero effects, but it will consume more computing time than iteratively weighted step for LASSO. Our simulation shows that those loci with non-zero effects solved from unweighted LASSO completely cover those from cross-validation of LASSO. Therefore, the cross-validation of LASSO is an option in our mapping method. Without the iteratively weighted step and cross-validation for LASSO, our proposed mapping method is faster than Xu’s IRLS. The program that implemented the mapping method is written in R language, which is freely available upon request from authors.

Only one kind of genetic effect is considered in our model for the simplification of description, which is suitable for backcross (BC), DH and recombinant inbreeding lines (RIL). Our method can be easily applied to analyze multiple genetic effects at each locus in other genetically designed population, such as intercross and outbred populations. The extension of continuous quantitative traits to discrete traits is being studied in our lab. It should be noted that the iteratively weighted step is not straightforward to incorporate in the LASSO for generalized linear model. In comparison, the extension of main effect model to epistatic model is not hard to map multiple interacting QTLs.

We developed the iteratively reweighted least absolute shrinkage and selection operator (IRLASSO) for extending IRLS to simultaneously map multiple QTLs.

The LASSO with coordinate descent step is employed to efficiently estimate non-zero genetic effect of each locus scanned over entire genome.

The method can, not only estimate QTL effects, but also test the significance of QTLs, like regular interval mapping.

Without the iteratively weighted step and cross-validation for LASSO, our proposed mapping method is faster than Xu’s IRLS [4 – 6].

## FUNDING

The National Natural Science Foundations of China (grants 30972077, 31172190 to R.Y. and 31201989 to Y.L.); the Applied National Science and Technology Support Program in China (grant 2012BAD26B01 to Y.L.).

## References

### APPENDIX

The expectation and variance of indicator variable.

For genetically designed populations with two genotypes at each locus, such as BC, DH and RIL, only additive effect is considered in genetic model. Define and let *P*_{1} and *P*_{2} be the conditional probabilities for the two genotypes given flanking marker information. The conditional expectation of indicator variable is

For F_{2} population, there are three genotypes, denoted by QQ, Qq and qq at each locus. Two genetic effects including additive and dominant effects are considered in genetic model. Define

*P*

_{1},

*P*

_{2}and

*P*

_{3}are the conditional probabilities for the genotypes QQ, Qq and qq, given flanking marker information. The conditional expectations of indicator variables are and their variance-covariance are and

*Var*(

*z*

_{2}) =

*P*

_{2}(1−

*P*

_{2}).

The covariance of indicator variables between additive and dominant effects is