Skip to Main Content

Article Navigation

Journal Article

RMTL: an R library for multi-task learning

Abstract

Motivation

Multi-task learning (MTL) is a machine learning technique for simultaneous learning of multiple related classification or regression tasks. Despite its increasing popularity, MTL algorithms are currently not available in the widely used software environment R, creating a bottleneck for their application in biomedical research.

Results

We developed an efficient, easy-to-use R library for MTL (www.r-project.org) comprising 10 algorithms applicable for regression, classification, joint predictor selection, task clustering, low-rank learning and incorporation of biological networks. We demonstrate the utility of the algorithms using simulated data.

Availability and implementation

The RMTL package is an open source R package and is freely available at https://github.com/transbioZI/RMTL. RMTL will also be available on cran.r-project.org.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Multi-task learning (MTL) is a machine learning technique that explores and exploits the relatedness across a set of different learning tasks. Since its inception (Caruana, 1998), MTL has been used in numerous data-intensive research areas, including biomedical informatics (Feriante, 2015; Li et al., 2016; Widmer and Ratsch, 2012; Xu et al., 2011; Yuan et al., 2016; Zhou et al., 2013), speech and natural language processing [i.e. (Wu et al., 2015)], image processing and computer vision [i.e. (Wang et al., 2009)], as well as web based applications [i.e. (Chapelle et al., 2010)].

A strong motivation to develop biomedical MTL applications stems from the necessity to integrate diverse data sources to explore the biological underpinning of complex illnesses, such as schizophrenia. Previous research has already shown that for such illnesses, integrative multi-omics open a new avenue for identification of etiological mechanisms, for example by taking into account genetic, expression and methylation data simultaneously [i.e. (Lin et al., 2014)]. For such applications, multi-task learning offers the possibility to directly explore illness-related biological profiles that are linked across data modalities and therefore a new route toward the identification of biomarker signatures.

Previous implementations of MTL have focused on knowledge transfer via regularization (Zhou et al., 2011), Bayesian methods (Greenlaw et al., 2017) or deep architectures (Yang and Hospedales, 2016). Here, we developed the first R library for MTL, offering a comprehensive machine learning pipeline that covers several types of MLT algorithms and can be easily applied to high-dimensional data.

In the following section, we briefly describe the RMTL package, including the implemented MTL methods (for detailed information see Supplementary Methods). The results section describes the application of the algorithms on a simulation study, to demonstrate the performance and interpretability of the respective models.

2 Materials and methods

This package provides an automated, simple-to-use implementation of MTL, comprising five classification and five regression algorithms, which share knowledge across tasks according to different priors via regularization. All algorithms aim to minimize the same objective:

\min_{W} \sum_{i}^{t} \frac{1}{n_{i}} L (W_{i} | X_{i}, Y_{i}) + Ω (W)

where

L (\circ)

is the loss function (logistic loss for classification or least square loss for regression).

X = {X_{i}}, Y = {Y_{i}}

are sets of predictor matrices and the corresponding responses for all

t

tasks where

X_{i} \in R^{n_{i} \times p}

and

Y_{i} \in R^{n_{i} \times 1}

is the predictor matrix and the response vector of task

i \in {1, 2, \dots, t}

⁠. Accordingly,

n_{i}

and

p

refer to the number of subjects and predictors (all tasks share the same predictor space) of task

i

⁠, respectively. Moreover,

W = R^{p \times t}

is the coefficient matrix for all tasks, where

W_{i}

⁠, the

i

th column of

W

⁠, is the coefficient vector for task

i

⁠.

Knowledge transfer among tasks is achieved via a convex term $Ω (W)$ that jointly modulates models according to specific functionalities. In this package, five common regularization techniques are implemented to suit different applications, i.e. sparse structure, joint predictor selection, low-rank structure, network constraint for task relatedness and task clustering. Here, we refer to the above regularization strategies as MTL_Lasso, MTL_L21, MTL_Trace, MTL_Graph and MTL_CMTL, in the same sequence. These strategies can be broadly categorized into two classes: strategies for predictor selection (MTL_Lasso and MTL_L21) and strategies for task relatedness exploration (MTL_Graph, MTL_Trace and MTL_CMTL). While the former class explores sparse patterns are explored over the predictor space, the latter class exploits task relatedness based on additional assumptions. For all algorithms, we implemented a solver based on the accelerated gradient descent method (Nesterov, 2013). To solve the non-smooth and convex regularization, the proximal operator (Parikh and Boyd, 2014) was applied. Overall, the solver achieves a complexity of O(1/k²), which is optimal among first-order gradient methods.Further methodological details are shown in the Supplementary Methods.

3 Results

Predictive performance and model interpretability of the implemented algorithms were explored using simulated data. The simulated datasets were constructed by the ground truth model $W$ ⁠, which is specified for a given prior (Supplementary Fig. S1). We compared the ground truth and the learnt model as an indicator of model interpretability. For predictive comparison, the primary baseline method was the conventional lasso, which reflects single task learning performance. We further applied MTL with lasso (MTL_Lasso), to explore the effect of inappropriate prior choice as a second baseline method.

3.1 Model interpretability

Supplementary Figure S1a shows the coefficient matrix of MTL_Lasso and MTL_L21 and demonstrates that the number of predictors identified by MTL_Lasso was approximately half the number of ground truth predictors. This may be due to the fact that highly correlated predictors exist in the high-dimensional space (Zou and Hastie, 2005). As a consequence and similar to conventional Lasso, MTL_Lasso tended to select one among several correlated predictors. Despite this, 75% (precision) of selected predictors were ground truth predictors. For MTC_L21, the ground truth was highly sparse: only 40 out of 400 predictors were active predictors for all tasks. The simulation demonstrates that 39 of the predictors were successfully identified (sensitivity: 97.5%), with a precision of 72%. These results indicate that MTL algorithms could successfully identify ground truth predictors.

The relatedness of tasks was represented by pairwise correlation between models. Supplementary Figure S1b shows that all methods were able to capture correctly the pairwise relatedness compared to the ground truths. Particularly, MTL_Graph incorporated a strong network prior such that the “in-group” differences became zero. This may be because the network prior provided the most complete information about task relatedness among all priors.

3.2 Predictive performance

Supplementary Figure S2 indicates that conventional Lasso failed to yield accurate predictions on all simulated datasets except when using the $l_{21}$ prior. Compared to this baseline, the MTL models improved the accuracy by 18.7% on average. The MTL_Lasso incorporating an inappropriate prior achieved an average accuracy of 67% and was substantially inferior to MTL models with appropriate priors (average accuracy: 79.2%).

4 Conclusion

In this study, we developed an R library for multi-task learning comprising 10 algorithms incorporating five different priors. MTL models outperformed two baseline methods when applied on simulated data. High model-interpretability was observed in terms of predictor selection and task-relatedness compared to the respective ground truths.

Funding

This study was supported by the Deutsche Forschungsgemeinschaft (DFG), SCHW 1768/1-1. In addition, this work is supported in part by the National Science Foundation under grants IIS-1615597 (to JZ) and IIS-1749940 (to JZ).

Conflict of Interest: none declared.

References

Caruana

R.

(

1998

)

Multitask Learning

.

Springer

,

USA

.

Chapelle

O.

et al. (

2010

) Multi-task learning for boosting with application to web search ranking. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '10.

Feriante

J.

(

2015

). Massively Multitask Deep Learning for Drug Discovery. University of Wisconsin-Madison.

Greenlaw

K.

et al. (

2017

)

A Bayesian group sparse multi-task regression model for imaging genetics

.

Bioinformatics

,

33

,

2513

–

2522

.

Li

Y.

et al. (

2016

) A Multi-Task Learning Formulation for Survival Analysis. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowedge Discovery and Data Mining (KDD’16).

Lin

D.

et al. (

2014

)

Integrative analysis of multiple diverse omics datasets by sparse group multitask regression

.

Front. Cell Dev. Biol

.,

2

,

62

.

Nesterov

Y.

(

2013

)

Gradient methods for minimizing composite functions

.

Math. Program

.,

140

,

125

–

161

.

Parikh

N.

,

Boyd

S.

(

2014

)

Proximal algorithms

.

Found. Trends Optim

.,

1

,

127

–

239

.

Wang

X.

et al. (

2009

) Boosted multi-task learning for face verification with applications to web image and video search. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Patter Recognition.

Widmer

C.

,

Ratsch

G.

(

2012

)

Multitask learning in computational biology

.

JMLR

,

27

,

207

–

216

.

OpenURL Placeholder Text

Wu

Z.

et al. (

2015

) Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis. In: Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP).

Xu

Q.

et al. (

2011

)

Multitask learning for protein subcellular location prediction

.

IEEE/ACM Trans. Comput. Biol. Bioinform

.,

8

,

748

–

759

.

OpenURL Placeholder Text

Yang

Y.

,

Hospedales

T.

(

2016

)

Deep multi-task representation learning: a tensor factorisation approach

.

arXiv Preprint arXiv

,

1605

,

06391

.

OpenURL Placeholder Text

Yuan

H.

et al. (

2016

)

Multitask learning improves prediction of cancer drug sensitivity

.

Sci. Rep

.,

6

,

31619

.

Zhou

J.

et al. (

2011

)

Malsar: Multi-task Learning via Structural Regularization

. Vol.

21

.

Arizona State University

.

OpenURL Placeholder Text

Zhou

J.

et al. (

2013

)

Modeling disease progression via multi-task learning

.

Neuroimage

,

78

,

233

–

248

.

Zou

H.

,

Hastie

T.

(

2005

)

Regularization and variable selection via the elastic net

.

J. R. Stat. Soc. B

,

67

,

301

–

320

.

© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Associate Editor:

Download all slides

Views

4,664

Altmetric

Total Views 4,664

3,632 Pageviews

1,032 PDF Downloads

Since 9/1/2018

Month:	Total Views:
September 2018	31
October 2018	84
November 2018	39
December 2018	40
January 2019	73
February 2019	59
March 2019	64
April 2019	59
May 2019	106
June 2019	48
July 2019	64
August 2019	36
September 2019	36
October 2019	39
November 2019	42
December 2019	26
January 2020	28
February 2020	24
March 2020	82
April 2020	99
May 2020	46
June 2020	87
July 2020	43
August 2020	56
September 2020	125
October 2020	90
November 2020	67
December 2020	67
January 2021	96
February 2021	61
March 2021	63
April 2021	94
May 2021	76
June 2021	72
July 2021	84
August 2021	67
September 2021	87
October 2021	117
November 2021	77
December 2021	140
January 2022	69
February 2022	99
March 2022	117
April 2022	106
May 2022	133
June 2022	69
July 2022	76
August 2022	80
September 2022	118
October 2022	95
November 2022	85
December 2022	53
January 2023	63
February 2023	74
March 2023	61
April 2023	67
May 2023	68
June 2023	50
July 2023	44
August 2023	37
September 2023	32
October 2023	53
November 2023	37
December 2023	51
January 2024	55
February 2024	68
March 2024	81
April 2024	29