Abstract

Omics, such as genomics, transcriptome and proteomics, has been affected by the era of big data. A huge amount of high dimensional and complex structured data has made it no longer applicable for conventional machine learning algorithms. Fortunately, deep learning technology can contribute toward resolving these challenges. There is evidence that deep learning can handle omics data well and resolve omics problems. This survey aims to provide an entry-level guideline for researchers, to understand and use deep learning in order to solve omics problems. We first introduce several deep learning models and then discuss several research areas which have combined omics and deep learning in recent years. In addition, we summarize the general steps involved in using deep learning which have not yet been systematically discussed in the existent literature on this topic. Finally, we compare the features and performance of current mainstream open source deep learning frameworks and present the opportunities and challenges involved in deep learning. This survey will be a good starting point and guideline for omics researchers to understand deep learning.

Introduction

The impressive achievement achieved by Google’s AlphaGo inspired some non-computer field researchers to pay attention to deep learning technology. Deep learning is a machine learning method which is based on neural networks. Compared with traditional machine learning methods, deep learning tends to have more network layers and requires more data, and at the same time, its ability to extract features automatically from raw data is greatly enhanced. Based on massive data and stronger feature learning ability, deep learning tends to achieve more satisfactory experimental results.

Deep learning technology has a long history, the earliest prototype was the MCP artificial neural model developed by McCulloch and Pitts [1] in 1943. Then, Rosenblatt [2] proposed the concept of perceptron on the basis of artificial neurons. In 1974, the backpropagation algorithm was proposed in Werbos’ [3] doctoral thesis, which realized the multilayer neural network. The most significant breakthrough in this field occurred in 2006, Hinton’s algorithm effectively resolved the problem of the gradient disappearance in backpropagation and revealed the potential of deep learning technology [4]. Now, this technology has begun to rapidly develop for the following three reasons:

  • With the arrival of the big data era, the amount of data has become huge. The data dimension has increased. Data structure has become more complex. Traditional machine learning methods, such as support vector machines, are not good at handling such data.

  • The development of computing hardware makes it feasible to train deep learning models.

  • The deep learning technology community, including big companies like Google, is growing rapidly every year, promoting the continuous development of this technology.

Approximate number of published articles. The number of articles is based on the search ‘deep learning’ and ‘deep learning + DNA/RNA/protein’ in https://apps.webofknowledge.com.
Figure 1

Approximate number of published articles. The number of articles is based on the search ‘deep learning’ and ‘deep learning + DNA/RNA/protein’ in https://apps.webofknowledge.com.

At present, deep learning technology has achieved great success in image recognition, speech recognition and natural language processing. In addition, many applications in bioinformatics, such as disease prediction using electronic health records [5, 6], the classification of biomedical images [7–10], biological signal processing [11–13], etc., have benefited from deep learning too. As an important discipline in biological science, omics is no exception. Now, omics data, represented by genome data, transcript data and proteome data, are increasing exponentially. Many well-known logical data projects or databases, such as the Encyclopedia of DNA Elements [14] and the Gene Expression Omnibus (GEO) [15], can provide a growing amount of publicly accessible data, which meets the need of deep learning for massive data, enabling deep learning can be applied in omics.

In fact, the development of contemporary omics has been inseparable from the support of deep learning. On the one hand, although some new experimental methods, such as X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy, can produce accurate experimental results, they can be time-consuming and expensive. On the other hand, most of the existing data are diverse, complex and high dimensional. Resolving a problem often requires a combination of various types of data. This increases the difficulty of data analysis. Deep learning technology has the potential to address these two problems. Compared with traditional experimental methods, it is faster and more economical. Compared with traditional machine learning methods, it is better able to handle these complex data, making it easier to obtain more accurate experimental results. The approach of combining deep learning and omics has gained great popularity since 2010, as shown in Figure 1.

Prior to this article, many researchers have reviewed the application of deep learning in bioinformatics, biomedicine and other fields [16–22]. However, at present, there has been no specific discussion on the application of deep learning in omics research. Unlike other works, we focus on genomics, transcriptomics, proteomics and other related research in omics, providing more detailed views on these areas. In addition, we provide a detailed guideline of how to apply deep learning technology in omics research, which has not been addressed in detail in previous works. First, we will introduce several deep learning models which are commonly used in omics research. Then, we will present the application cases and the latest developments in deep learning in the field of omics in the past few years. The steps involved in using deep learning technology in omics research will also be discussed. In addition, in order to enable researchers, who do not understand deep learning, to apply this technology, we will summarize and compare several open-source deep learning frameworks and bioinformatics tools. Finally, we will discuss the potential challenges of and future opportunities in this field. We believe that this work will assist omics researchers to understand deep learning technology.

Deep learning models in omics

Deep learning models are varied, and different models are appropriate for dealing with different types of problems. Here, we will introduce three common models: Deep neural networks (DNNs), convolution neural networks (CNNs) and recurrent neural networks (RNNs).

Deep neural networks

In this section, we will use DNNs to represent those fully connected neural networks which comprise multilayer perceptron (MLP) [23], auto-encoder [24] and restricted Boltzmann machine (RBM) [25].

MLP is also known as a multilayer neural network. In addition to the input and output layers, there are multiple hidden layers. One of the simplest MLP models is shown in Figure 2A. By inputting a large amount of training data, MLP can constantly adjust the weights between two neurons by using the backpropagation algorithm so that the correct network can be established between the output and input layers. Therefore, training an MLP is usually implemented by using a supervised method when a large amount of labeled data is available. MLP is widely used when features are not related in time or space in omics research.

Some typical DNN structures. (A) An MLP structure that contains only one hidden layer. (B) A typical auto-encoder structure. (C) A typical RBM structure. (D) The basic structure and training process of DBN network. The 1st step of training is to pre-train each layer of RBM along the solid arrow, and the 2nd step is fine-tuning the network along the dashed arrow based on the labeled data.
Figure 2

Some typical DNN structures. (A) An MLP structure that contains only one hidden layer. (B) A typical auto-encoder structure. (C) A typical RBM structure. (D) The basic structure and training process of DBN network. The 1st step of training is to pre-train each layer of RBM along the solid arrow, and the 2nd step is fine-tuning the network along the dashed arrow based on the labeled data.

The auto-encoder is a neural network which reproduces the input signal as much as possible. It can capture the most important features of the input data and restores the original data. Its main idea is to regard the hidden layer of the neural network as both an encoder and decoder. After the input data are encoded and decoded by the hidden layer, it is necessary to ensure that the decoded data are consistent with the original input data. One of the simplest auto-encoder structures is shown in Figure 2B. Auto-encoders routinely use greedy layer-wise pretraining methods to implement unsupervised learning. A trained auto-encoder is often used for data reduction or feature extraction and is often used in a situation in which there is no large amount of labeled data. In general, it is difficult to obtain a large amount of labeled omics data, and the omics data usually show high-dimensional characteristics. Therefore, an auto-encoder is often used in omics research.

The RBM proposed by Hinton et al. is a generative stochastic neural network which contains a visible layer and a hidden layer. In RBM, the neurons in different layers are connected to each other; however, the neurons in the same layer are independent of each other. Furthermore, the connections between the neurons are bidirectional and symmetrical, as shown in Figure 2C. Based on the energy model and the probability equation, RBM can establish the correct relationship model between the visual and hidden layers, thus extracting the features of the original data. RBM can be trained by an unsupervised method, for instance, the contrastive divergence algorithm [26]. In the research of omics, RBM is used in two main ways: encoding the data and then using supervised learning methods to classify or regress the data, such as the deep belief network (DBN) [4]; and using RBM to achieve the weight matrix and offset, which can initialize the BP neural network.

After having introduced the three basic components of DNNs, we now introduce a DNN model: DBN [4], which is commonly used in omics research. One of the most classical DBN structures consists of several RBM layers and one BP layer, as shown in Figure 2D. Training this DBN model involves two steps. The 1st step is pretraining each layer of RBM, and the 2nd step is fine-tuning the network based on the labeled data. DBN has achieved considerable progress in omics research. Many common omics challenges, such as protein residue–residue contacts prediction [27] and RNA binding protein site prediction [28], use DBN to resolve these issues.

A simple CNN model structure diagram.
Figure 3

A simple CNN model structure diagram.

Schematic diagrams of convolution operation and pooling operation. (A) Schematic diagram of convolution operation. (B) Max-Pooling and Mean-Pooling schematic diagram.
Figure 4

Schematic diagrams of convolution operation and pooling operation. (A) Schematic diagram of convolution operation. (B) Max-Pooling and Mean-Pooling schematic diagram.

In general, DNNs are a kind of conventional and effective neural network model. Although the best results are not guaranteed, this type of model can be adapted to almost all types of data. Therefore, DNNs are worth trying in the research of omics.

Convolution neural network

CNNs were first proposed by LeCun in 1989 [29]. In recent years, CNNs have been successfully applied in many fields, including speech recognition, face recognition, general object recognition, motion analysis and natural language processing. CNNs have also played an important role in omics research, including gene expression prediction, protein classification and gene structure prediction.

In general, CNNs consist of multiple convolution layers, pooling layers and a full connection layer. A simple CNN structure is shown in Figure 3. The function of a convolution operation is to extract the various features of the data. In the process of convolution, a convolution kernel can slide on the input window so that the weight parameters on the convolution kernel are multiplied by the corresponding pixels. Then, the results follow multiplication. The role of the pooling layer is to abstract the original characteristic signal, which greatly reduces the training parameters and can also reduce the degree of overfitting. Pooling operations can be divided into two categories: max-pooling and mean-pooling. Max-pooling involves selecting the largest value of a corresponding pixel as a sampling result, and mean-pooling involves calculating the average value of the corresponding pixel as the sampling result. The principles of convolution operations and pooling operations are shown in Figure 4.

Compared to other models, CNNs have the outstanding ability to analyze spatial information and require less data preprocessing steps. Therefore, CNNs are particularly good for manipulating image data, and encoding omics data into two-dimensional image matrices is often quite easy. CNNs have made good achievements in the identification of various gene sequence structures, such as protein binding sites and enhancer sequences. In addition, CNN's ability in transfer learning is powerful when it is difficult to obtain a large amount of available markup omics data.

Recurrent neural networks

RNNs are a kind of neural network models proposed in the late 1980s [30]. In recent years, RNNs have been increasingly applied in many fields, such as natural language processing, image recognition and speech recognition. In omics research, RNNs have various applications, such as determining the exon/intron boundaries of a gene, predicting RNA sequence-specific bias, etc.

The reason why RNNs are named as such is that the input of the hidden layer includes not only the output of the input layer, but also the output of the hidden layer at the last moment. A simple RNNs model can be expanded into a complex network. A specific RNNs structure diagram and the time order dependency map are shown in Figure 5.

The structure of RNN and the structure after unfolding by time. Ht is the hidden state of time t, and Ot represents the output of time t; U is the direct weight of the input layer to the hidden layer, which abstracts our original input as a hidden layer input; W is the weight of the hidden layer to the hidden layer, which is the memory controller of the network that is responsible for scheduling the memory; V is the weight of the hidden layer to the output layer, and the features learned from the hidden layer will pass through it again and as a final output.
Figure 5

The structure of RNN and the structure after unfolding by time. Ht is the hidden state of time t, and Ot represents the output of time t; U is the direct weight of the input layer to the hidden layer, which abstracts our original input as a hidden layer input; W is the weight of the hidden layer to the hidden layer, which is the memory controller of the network that is responsible for scheduling the memory; V is the weight of the hidden layer to the output layer, and the features learned from the hidden layer will pass through it again and as a final output.

At present, the two most widely used RNNs architectures are Long short-term memory (LSTM) networks [31] and gated recurrent unit (GRU) networks [32]. These two networks are enhanced versions of the general RNNs structure. Training RNNs model is slightly different to training DNNs models. If an RNNs model is expanded, the parameters W, U and V are shared, but the traditional neural network is not. Furthermore, in the process of training, the output of each step depends not only on the current network but also on the state of several previous networks. This increases the difficulty of training, which is prone to problems such as gradient explosion. Fortunately, some improved networks, such as LSTM and GRU, have been able to resolve such problems.

In view of the powerful memory ability of RNNs, it can resolve the problem of time series well. There is a dependency between most omics data, such as nucleotide and amino acid sequences. RNNs can automatically learn the correlation of the elements in a sequence from such data and extract the global sequence characteristics. Therefore, RNNs also occupy a key position in omics research.

So far, we have explained three of the most commonly used deep learning models in omics research, namely DNNs, CNNs and RNNs. However, the three models can be combined according to actual needs. This can achieve better performance. For example, the approach in [33] was to use a hybrid CNN-LSTM model for predicting the properties and functions of DNA. Compared with the CNN-based method, the performance of this hybrid model was significantly higher.

Application of deep learning in omics

At present, an increasing number of omics researchers have taken note of the value of deep learning technology. They have used deep learning to resolve some problems in this field and have achieved higher accuracy and faster speed than traditional methods. In this section, we will briefly explain the latest advances in deep learning from three perspectives: genomics, transcriptomics and proteomics.

Genomics

First, we can use deep learning technology to predict and identify the functional units in DNA sequences, including replication domain, transcription factor binding site (TFBS), transcription initiation point, promoter, enhancer and gene deletion site. In 2015, a novel hybrid architecture combining a pre-trained, DNN and a hidden Markov model was developed [34] to identify distinct replication domain types. This model achieved significant improvements in terms of recognition accuracy and robustness compared with previous methods. In 2016, a deep convolutional/highway MLP framework [35] was applied to classify genomic sequences according to the TFBS and achieved a good result; a median area under the curve (AUC) of 0.946. In [36], a CNN model was used to analyze the sequence characteristics of prokaryotic and eukaryotic promoters and to develop predictive models. This experiment was excellent for the classification of promoter sequences and non-promoter sequences; for human promoters, the accuracy of prediction reached 0.90 on TATA and 0.89 on non-TATA promoter sequences [36]. Similarly, the methods used in [37] effectively distinguished active enhancers and promoters by using a deep three-layer feed forward neural network. They achieved a maximum level of accuracy of 93.59% on GM12878 lymphoblastic cells. In addition, for gene deletion, the experiment in [38] proposed a tool named CNNdel, which uses shallow CNNs to detect genomic deletions with real data from the 1000 Genomes Project. The experiment’s results show that accuracy and sensitivity are both improving compared with other existent methods. In general, DNA sequence data are commonly used as the primary training data in predicting and identifying functional units in DNA sequences. Moreover, according to our summary, in such research, the application of CNNs is more and more common, and the application of DNNs such as DBN and MLP is gradually decreasing. In the past two years, CNNs have taken a mainstream position in terms of prediction of promoters, enhancers, TFBSs, replication domains, detection of gene deletions and differentiation of intron exons. In addition, the hybrid model of CNN+LSTM is gradually being used.

Deep learning technology can also predict gene expression. This work usually involves predicting the expression of the target gene, predicting gene function, modeling gene regulatory networks, etc. For example, in 2016, Chen, et al. [39] used the microarray-based GEO dataset to train a DNN model to infer the expression of target genes. Their methods were significantly better than logistic regression, with a relative improvement of 15.33% on MAE. In the same year, a DNN model based on MLP and a stacked denoising auto-encoder [40] was proposed to predict gene expression from genotypes of genetic variation, which achieved better performance than lasso and random forests. For another example, a novel hybrid convolutional and bi-directional LSTMRNN framework named DanQ in [41] was proposed for predicting the functions of noncoding regions. Compared with related models, DanQ achieved 97.6% of targets in terms of the Precision-Recall curve. With regard to the gene regulatory network, the approach taken in [42] was to use an RNN model to train the gene regulatory network. The obtained results are superior to all the previous methods, and robustness can be maintained. On the whole, gene expression profiling, DNA sequence data with functional labels and histone modification data are all common training data when predicting gene expression. In predicting the expression of target genes and the function of genes, CNNs are currently the most commonly used deep learning models, followed by MLP. The application of deep learning in modeling gene regulatory networks is still relatively rare. In comparison, RNNs are the most extensive deep learning model used in this field.

Using deep learning technology, we can also explore genomes and diseases in epigenetic and other fields. For example, in 2017, an MLP model was used to predict cancer risk and cancer survival rates [43]. By using the clinical and molecular data of the cancer genomic map (TCGA) as training data, this work achieved comparable performance to the cox elasticity network. Another example, [44], involved using a deep CNN to predict the impact of sequence variation on proximal CpG site DNA methylation, which achieved 0.854 of area under the receiver operating characteristic curve (AUROC). Similarly, a method named DeepCpG was proposed in [45] to predict methylation states in single cells. By using an RNN-CNN joint network, this method can predict methylation states in single cells accurately, and the parameters of the model can be interpreted, thereby providing insight into how sequence composition affects methylation variability. According to our summary, in predicting DNA methylation, DNA sequences and their methylation status are commonly used as training data. In this work, CNNs are the most commonly used deep learning model, and RNNs are sometimes used in this field, but it is usually combined with CNNs into a hybrid model rather than being used alone. In the research of the association between genomics and disease, TCGA data, gene expression profiles and clinical data are common training data. And the most commonly used deep learning model is DNNs. Among them, auto-encoders are often used for feature extraction, while DBN is often used to directly predict or classify diseases.

Transcriptomics

Using deep learning technology, we can analyze the structure of RNA sequences, including predicting RBP binding sites, alternative splicing sites and RNA types. For example, in 2015, a DBN neural network was used to discover potential binding motifs and predict novel candidate binding sites [28]. This model uses RNA sequence information, RNA secondary structure information and RNA tertiary structure information as training data and achieves a 22% reduction in the MRE compared to previous methods. For another example, the approach taken in [84] was to employ deep CNNs for a novel splice junction classification tool named DeepSplice. Compared with traditional machine learning methods, this method not only improves accuracy but also increases computational efficiency and flexibility. Furthermore, in 2017, an MLP neural network was constructed in [46] to achieve the correct classification of pre-miRNAs and other pseudo hairpins, which achieved a level of accuracy of 0.968 ± 0.002. In general, RNA sequences, RNA secondary and tertiary structures, CLIP-seq data, etc. are very useful training data when predicting the structure of RNA. The most common models for predicting RBP binding sites and alternative splice sites are CNNs, DBN and the hybrid model of CNN+DBN. In the classification of RNA, such as identifying whether a RNA sequence is miRNA or long non-coding RNA (lncRNAs), the most commonly used deep learning models are MLP, RBM and RNNs. And from the frequency of use in the last two years, RNNs represented by LSTM will be more and more widely used in this area.

Deep learning technology can also be used in other fields, such as the association between RNA and disease, RNA and drug design, etc. For example, in 2014, a classification model for disease was successfully trained [47] by using a DBN. By using miRNA data as training data, this method increases the F1-measure of many kinds of cancer test data by 6–10% compared with average machine learning methods. Furthermore, the approach taken in [48] was to use transcriptome data to combine DNN models to identify the pharmacological properties of multiple drugs in different biological systems and conditions. Ultimately, their approach achieved much better classification performance than previous support vector machine (SVM) methods. In general, training data commonly used in such research include RNA sequence data represented by miRNA-seq, transcriptome data in TCGA and RNA methylation status data. The most commonly used deep learning models are MLP and DBN. In addition, auto-encoders are sometimes used to extract raw data features.

Proteomics

Similarly, deep learning technology can identify protein structures, including protein secondary tertiary structure prediction, protein model quality assessment, protein contact map prediction, etc. For example, the approach taken in [49] was to use stacked sparse auto-encoders to predict secondary structures and torsion angles. This method used the original amino acid sequence as the original input and the protein secondary structure, backbone torsion angles and other features as an iterative input. The model nearly achieved a level of accuracy of 82% in predicting secondary structures. Additionally, the approach taken in [36] was to evaluate the quality of the protein model by replacing the support vector machine with a DNN, which increased the Pearson correlation coefficient from 0.85 to 0.9. The approach taken in [50] was to use an ultra DNN, formed by combining two deep residual neural networks to predict contacts by integrating both sequence conservation information and evolutionary coupling, achieving the highest F1 score on free-modeling targets in the latest critical assessment of protein structure prediction (CASP). According to our summary, predicting protein structure generally requires amino acid sequences, low-dimensional structures of proteins and physicochemical properties of amino acids as training data. In this work, the most commonly used deep learning model in literature is DNNs. For example, auto-encoder is often used to extract the characteristics of input data. MLP, DBN and RBM are often used as the core model for predicting protein structure. However, from the recent literature, CNNs and RNNs, especially RNNs, have gradually been applied to this field as the main model for predicting protein structure, and have achieved higher accuracy than DNNs.

Table 1

The application of deep learning in omics learning

ClassificationProblem to be solvedDeep learning model
GenomicsDNA sequence structureMLP(DNN) [37, 55, 56]
SAE(DNN)[57]
DBN(DNN) [34, 58]
CNN [33, 36, 38, 59–68]
RNN [33, 63, 64, 69, 70]
Gene expression regulationMLP(DNN) [39, 40]
SAE(DNN) [40, 71]
CNN [41, 72–80]
RNN [41, 81]
Gene expression and diseaseMLP(DNN) [43, 82, 83]
SAE(DNN) [84, 85]
DBN(DNN) [86–88]
Genotype and DrugsMLP(DNN) [89]
Epigenomics (DNA methylation)CNN [44, 45]
TranscriptomicsRNA sequence structureMLP(DNN)[90–92]
SAE(DNN) [93]
DBN(DNN) [28, 46, 94]
CNN [94]
RNN [95–97]
RNA and drug classificationMLP(DNN) [48] SAE(DNN) [98]
RNA and disease predictionMLP(DNN) [99]
SAE(DNN) [100]
DBN(DNN) [47]
CNN [101]
ProteomicsProtein classificationRNN [102]
Protein structureMLP(DNN) [103–106]
SAE(DNN) [49, 107, 108]
DBN(DNN) [27, 109–111]
CNN [50, 112–114]
RNN [112, 115]
Protein functionCNN [51, 116, 117]
LSTM(RNN) [52]
Drug designCNN[118]
Intracellular distribution of proteinsSAE(DNN) [119]
CNN [54]
RNN [120, 121]
Protein interactionsMLP(DNN) [122]
RNN [53, 123]
ClassificationProblem to be solvedDeep learning model
GenomicsDNA sequence structureMLP(DNN) [37, 55, 56]
SAE(DNN)[57]
DBN(DNN) [34, 58]
CNN [33, 36, 38, 59–68]
RNN [33, 63, 64, 69, 70]
Gene expression regulationMLP(DNN) [39, 40]
SAE(DNN) [40, 71]
CNN [41, 72–80]
RNN [41, 81]
Gene expression and diseaseMLP(DNN) [43, 82, 83]
SAE(DNN) [84, 85]
DBN(DNN) [86–88]
Genotype and DrugsMLP(DNN) [89]
Epigenomics (DNA methylation)CNN [44, 45]
TranscriptomicsRNA sequence structureMLP(DNN)[90–92]
SAE(DNN) [93]
DBN(DNN) [28, 46, 94]
CNN [94]
RNN [95–97]
RNA and drug classificationMLP(DNN) [48] SAE(DNN) [98]
RNA and disease predictionMLP(DNN) [99]
SAE(DNN) [100]
DBN(DNN) [47]
CNN [101]
ProteomicsProtein classificationRNN [102]
Protein structureMLP(DNN) [103–106]
SAE(DNN) [49, 107, 108]
DBN(DNN) [27, 109–111]
CNN [50, 112–114]
RNN [112, 115]
Protein functionCNN [51, 116, 117]
LSTM(RNN) [52]
Drug designCNN[118]
Intracellular distribution of proteinsSAE(DNN) [119]
CNN [54]
RNN [120, 121]
Protein interactionsMLP(DNN) [122]
RNN [53, 123]
Table 1

The application of deep learning in omics learning

ClassificationProblem to be solvedDeep learning model
GenomicsDNA sequence structureMLP(DNN) [37, 55, 56]
SAE(DNN)[57]
DBN(DNN) [34, 58]
CNN [33, 36, 38, 59–68]
RNN [33, 63, 64, 69, 70]
Gene expression regulationMLP(DNN) [39, 40]
SAE(DNN) [40, 71]
CNN [41, 72–80]
RNN [41, 81]
Gene expression and diseaseMLP(DNN) [43, 82, 83]
SAE(DNN) [84, 85]
DBN(DNN) [86–88]
Genotype and DrugsMLP(DNN) [89]
Epigenomics (DNA methylation)CNN [44, 45]
TranscriptomicsRNA sequence structureMLP(DNN)[90–92]
SAE(DNN) [93]
DBN(DNN) [28, 46, 94]
CNN [94]
RNN [95–97]
RNA and drug classificationMLP(DNN) [48] SAE(DNN) [98]
RNA and disease predictionMLP(DNN) [99]
SAE(DNN) [100]
DBN(DNN) [47]
CNN [101]
ProteomicsProtein classificationRNN [102]
Protein structureMLP(DNN) [103–106]
SAE(DNN) [49, 107, 108]
DBN(DNN) [27, 109–111]
CNN [50, 112–114]
RNN [112, 115]
Protein functionCNN [51, 116, 117]
LSTM(RNN) [52]
Drug designCNN[118]
Intracellular distribution of proteinsSAE(DNN) [119]
CNN [54]
RNN [120, 121]
Protein interactionsMLP(DNN) [122]
RNN [53, 123]
ClassificationProblem to be solvedDeep learning model
GenomicsDNA sequence structureMLP(DNN) [37, 55, 56]
SAE(DNN)[57]
DBN(DNN) [34, 58]
CNN [33, 36, 38, 59–68]
RNN [33, 63, 64, 69, 70]
Gene expression regulationMLP(DNN) [39, 40]
SAE(DNN) [40, 71]
CNN [41, 72–80]
RNN [41, 81]
Gene expression and diseaseMLP(DNN) [43, 82, 83]
SAE(DNN) [84, 85]
DBN(DNN) [86–88]
Genotype and DrugsMLP(DNN) [89]
Epigenomics (DNA methylation)CNN [44, 45]
TranscriptomicsRNA sequence structureMLP(DNN)[90–92]
SAE(DNN) [93]
DBN(DNN) [28, 46, 94]
CNN [94]
RNN [95–97]
RNA and drug classificationMLP(DNN) [48] SAE(DNN) [98]
RNA and disease predictionMLP(DNN) [99]
SAE(DNN) [100]
DBN(DNN) [47]
CNN [101]
ProteomicsProtein classificationRNN [102]
Protein structureMLP(DNN) [103–106]
SAE(DNN) [49, 107, 108]
DBN(DNN) [27, 109–111]
CNN [50, 112–114]
RNN [112, 115]
Protein functionCNN [51, 116, 117]
LSTM(RNN) [52]
Drug designCNN[118]
Intracellular distribution of proteinsSAE(DNN) [119]
CNN [54]
RNN [120, 121]
Protein interactionsMLP(DNN) [122]
RNN [53, 123]

Deep learning technology can also be used to predict protein function. For example, in [51], a CNN model was used to identify the function of protein. The experiment used a protein tertiary structure as input and achieved a level of accuracy of 87.6%. Furthermore, the experiment in [52] used an LSTM model to predict the function of four kinds of proteins. Using the original amino acid sequence as training data, the model achieved a level of accuracy of over 99%. In general, when predicting protein function, amino acid sequence, protein structure and the data of protein–protein interactions are very useful information. CNNs and LSTM are the most important prediction models at present.

Deep learning technology can also be used for predicting protein–protein interactions, protein subcellular localization, among many other functions. For example, the approach taken in [53] was to use a stacked auto-encoder to predict the sequence-based protein–protein interaction. This model achieved an average level of accuracy of 97.19%. Additionally, the approach taken in [54] was to utilize a CNN to automate the work of detecting the cell compartment where a fluorescently labeled protein is located. This model performs very well, achieving a level of accuracy of 91% for each cell localization classification, with a level of accuracy of 99% per protein. When predicting protein–protein interactions, amino acid sequences are the most common training data and DNNs, CNNs and RNNs have been used as prediction models in different literatures. In comparison, CNNs are slightly higher in terms of frequency of use and prediction accuracy. When studying protein subcellular localization, amino acid sequences and fluorescently labeled microscopic images are commonly used training data. The application of deep learning in this field is still relatively rare. In related work, some researchers use CNNs as the core model, some researchers use RNNs as the core model and some researchers use stacked auto-encoder as the core model. But in comparison, the classification accuracy of CNNs is higher.

We have briefly described several typical examples of the application of deep learning in omics research. More specific work is included in Table 1. Of course, we believe that deep learning can achieve even greater success in the field of omics, as better training data, more advanced deep learning models, more reasonable deep learning architecture and parametric designs can further improve performance.

Table 2

Some open source software and source code

Problem to be solvedDeep learning modelThe source of the software or source codeType
Predict RBP binding sitesDBNhttps://github.com/thucombio/deepnet-rbp [28]code
DNA binding protein site predictionCNNhttp://cnn.csail.mit.edu [44] code
Identify and distinguish replication domains based on replication timing profilesDBNhttps://github.com/wenjiegroup [34]code
Identification of enhancer and promoter regions in the human genome.MLP(DNN)https://github.com/yifeng-li [37]code
MLP(DNN)https://github.com/wenjiegroup/PEDLA [55]code
Discriminate between bound and unbound sequences of TF binding dataCNNhttps://github.com/kundajelab/keras/tree/keras_1 [61]code
Prediction binding of all TF / cell type pairsCNN+RNN(LSTM)http://github.com/uci-cbcl/FactorNet [33]code
Predict conservative sequencesCNNhttps://github.com/uci-cbcl/DeepCons [79]code
Predict translation initiation sitesCNNhttps://github.com/zhangsaithu/titer [66]code
Annotating the pathogenicity of genetic variantsMLP(DNN)https://cbcl.ics.uci.edu/public_data/DANN/ [83]code
Gene expression data compendium for Pseudomonas aeruginosaSAE(DNN)https://github.com/greenelab/adage [56] code
Predicting the properties and function of DNA sequencesCNN+RNNhttp://github.com/uci-cbcl/DanQ. [41]code
Predict gene expressionCNNhttps://github.com/QData/DeepChrome [76]code
MLP(DNN)https://github.com/uci-cbcl/D-GEX [27]
Histone ChIP-seq data denoisingCNNhttps://github.com/kundajelab/coda [77]code
Patient prognosis based on transcriptome dataMLP(DNN)https://github.com/lanagarmire/cox-nnet [99]code
Predict the effect of genome sequence variation on DNA methylationCNNhttp://cpgenie.csail.mit.edu [31] code
Use the clinical and molecular data of TCGA to predict the risk of disease and survivalMLP(DNN)https://github.com/CancerDataScience/SurvivalNet [30]code
Predict protein contactsCNNhttp://raptorx.uchicago.edu/ContactMap/ [36] webserver
MLP(DNN)http://compbio.robotics.tu-berlin.de/epsilon/[105]webserver
MLP(DNN)http://scratch.proteomics.ics.uci.edu/ [105]webserver
Protein model quality assessmentMLP(DNN)http://proq3.bioinfo.se/ [104]webserver
Identify protein foldingDBNhttp://iris.rnet.missouri.edu/dnfold [110]webserver
Comprehensive websitehttp://www.softberry.com/ [36]webserver
http://sparks-lab.org [49, 108]webserver
Problem to be solvedDeep learning modelThe source of the software or source codeType
Predict RBP binding sitesDBNhttps://github.com/thucombio/deepnet-rbp [28]code
DNA binding protein site predictionCNNhttp://cnn.csail.mit.edu [44] code
Identify and distinguish replication domains based on replication timing profilesDBNhttps://github.com/wenjiegroup [34]code
Identification of enhancer and promoter regions in the human genome.MLP(DNN)https://github.com/yifeng-li [37]code
MLP(DNN)https://github.com/wenjiegroup/PEDLA [55]code
Discriminate between bound and unbound sequences of TF binding dataCNNhttps://github.com/kundajelab/keras/tree/keras_1 [61]code
Prediction binding of all TF / cell type pairsCNN+RNN(LSTM)http://github.com/uci-cbcl/FactorNet [33]code
Predict conservative sequencesCNNhttps://github.com/uci-cbcl/DeepCons [79]code
Predict translation initiation sitesCNNhttps://github.com/zhangsaithu/titer [66]code
Annotating the pathogenicity of genetic variantsMLP(DNN)https://cbcl.ics.uci.edu/public_data/DANN/ [83]code
Gene expression data compendium for Pseudomonas aeruginosaSAE(DNN)https://github.com/greenelab/adage [56] code
Predicting the properties and function of DNA sequencesCNN+RNNhttp://github.com/uci-cbcl/DanQ. [41]code
Predict gene expressionCNNhttps://github.com/QData/DeepChrome [76]code
MLP(DNN)https://github.com/uci-cbcl/D-GEX [27]
Histone ChIP-seq data denoisingCNNhttps://github.com/kundajelab/coda [77]code
Patient prognosis based on transcriptome dataMLP(DNN)https://github.com/lanagarmire/cox-nnet [99]code
Predict the effect of genome sequence variation on DNA methylationCNNhttp://cpgenie.csail.mit.edu [31] code
Use the clinical and molecular data of TCGA to predict the risk of disease and survivalMLP(DNN)https://github.com/CancerDataScience/SurvivalNet [30]code
Predict protein contactsCNNhttp://raptorx.uchicago.edu/ContactMap/ [36] webserver
MLP(DNN)http://compbio.robotics.tu-berlin.de/epsilon/[105]webserver
MLP(DNN)http://scratch.proteomics.ics.uci.edu/ [105]webserver
Protein model quality assessmentMLP(DNN)http://proq3.bioinfo.se/ [104]webserver
Identify protein foldingDBNhttp://iris.rnet.missouri.edu/dnfold [110]webserver
Comprehensive websitehttp://www.softberry.com/ [36]webserver
http://sparks-lab.org [49, 108]webserver
Table 2

Some open source software and source code

Problem to be solvedDeep learning modelThe source of the software or source codeType
Predict RBP binding sitesDBNhttps://github.com/thucombio/deepnet-rbp [28]code
DNA binding protein site predictionCNNhttp://cnn.csail.mit.edu [44] code
Identify and distinguish replication domains based on replication timing profilesDBNhttps://github.com/wenjiegroup [34]code
Identification of enhancer and promoter regions in the human genome.MLP(DNN)https://github.com/yifeng-li [37]code
MLP(DNN)https://github.com/wenjiegroup/PEDLA [55]code
Discriminate between bound and unbound sequences of TF binding dataCNNhttps://github.com/kundajelab/keras/tree/keras_1 [61]code
Prediction binding of all TF / cell type pairsCNN+RNN(LSTM)http://github.com/uci-cbcl/FactorNet [33]code
Predict conservative sequencesCNNhttps://github.com/uci-cbcl/DeepCons [79]code
Predict translation initiation sitesCNNhttps://github.com/zhangsaithu/titer [66]code
Annotating the pathogenicity of genetic variantsMLP(DNN)https://cbcl.ics.uci.edu/public_data/DANN/ [83]code
Gene expression data compendium for Pseudomonas aeruginosaSAE(DNN)https://github.com/greenelab/adage [56] code
Predicting the properties and function of DNA sequencesCNN+RNNhttp://github.com/uci-cbcl/DanQ. [41]code
Predict gene expressionCNNhttps://github.com/QData/DeepChrome [76]code
MLP(DNN)https://github.com/uci-cbcl/D-GEX [27]
Histone ChIP-seq data denoisingCNNhttps://github.com/kundajelab/coda [77]code
Patient prognosis based on transcriptome dataMLP(DNN)https://github.com/lanagarmire/cox-nnet [99]code
Predict the effect of genome sequence variation on DNA methylationCNNhttp://cpgenie.csail.mit.edu [31] code
Use the clinical and molecular data of TCGA to predict the risk of disease and survivalMLP(DNN)https://github.com/CancerDataScience/SurvivalNet [30]code
Predict protein contactsCNNhttp://raptorx.uchicago.edu/ContactMap/ [36] webserver
MLP(DNN)http://compbio.robotics.tu-berlin.de/epsilon/[105]webserver
MLP(DNN)http://scratch.proteomics.ics.uci.edu/ [105]webserver
Protein model quality assessmentMLP(DNN)http://proq3.bioinfo.se/ [104]webserver
Identify protein foldingDBNhttp://iris.rnet.missouri.edu/dnfold [110]webserver
Comprehensive websitehttp://www.softberry.com/ [36]webserver
http://sparks-lab.org [49, 108]webserver
Problem to be solvedDeep learning modelThe source of the software or source codeType
Predict RBP binding sitesDBNhttps://github.com/thucombio/deepnet-rbp [28]code
DNA binding protein site predictionCNNhttp://cnn.csail.mit.edu [44] code
Identify and distinguish replication domains based on replication timing profilesDBNhttps://github.com/wenjiegroup [34]code
Identification of enhancer and promoter regions in the human genome.MLP(DNN)https://github.com/yifeng-li [37]code
MLP(DNN)https://github.com/wenjiegroup/PEDLA [55]code
Discriminate between bound and unbound sequences of TF binding dataCNNhttps://github.com/kundajelab/keras/tree/keras_1 [61]code
Prediction binding of all TF / cell type pairsCNN+RNN(LSTM)http://github.com/uci-cbcl/FactorNet [33]code
Predict conservative sequencesCNNhttps://github.com/uci-cbcl/DeepCons [79]code
Predict translation initiation sitesCNNhttps://github.com/zhangsaithu/titer [66]code
Annotating the pathogenicity of genetic variantsMLP(DNN)https://cbcl.ics.uci.edu/public_data/DANN/ [83]code
Gene expression data compendium for Pseudomonas aeruginosaSAE(DNN)https://github.com/greenelab/adage [56] code
Predicting the properties and function of DNA sequencesCNN+RNNhttp://github.com/uci-cbcl/DanQ. [41]code
Predict gene expressionCNNhttps://github.com/QData/DeepChrome [76]code
MLP(DNN)https://github.com/uci-cbcl/D-GEX [27]
Histone ChIP-seq data denoisingCNNhttps://github.com/kundajelab/coda [77]code
Patient prognosis based on transcriptome dataMLP(DNN)https://github.com/lanagarmire/cox-nnet [99]code
Predict the effect of genome sequence variation on DNA methylationCNNhttp://cpgenie.csail.mit.edu [31] code
Use the clinical and molecular data of TCGA to predict the risk of disease and survivalMLP(DNN)https://github.com/CancerDataScience/SurvivalNet [30]code
Predict protein contactsCNNhttp://raptorx.uchicago.edu/ContactMap/ [36] webserver
MLP(DNN)http://compbio.robotics.tu-berlin.de/epsilon/[105]webserver
MLP(DNN)http://scratch.proteomics.ics.uci.edu/ [105]webserver
Protein model quality assessmentMLP(DNN)http://proq3.bioinfo.se/ [104]webserver
Identify protein foldingDBNhttp://iris.rnet.missouri.edu/dnfold [110]webserver
Comprehensive websitehttp://www.softberry.com/ [36]webserver
http://sparks-lab.org [49, 108]webserver

Open-source software

Especially in the past two years, some excellent software has been developed for applying deep learning technology to omics research. In 2015, Alipanahi, et al. [73] developed a tool called DeepBind to explore the sequence specificities of DNA- and RNA-binding proteins. In the same year, Zhou and Troyanskaya [72] developed a tool called DeepSEA for identifying the functional effects of noncoding variants. In 2016, Kelley, et al. [74] developed a tool called Basset to understand the complex language of eukaryotic gene expression. At present, these three tools have become benchmarks in the field.

In addition to these three tools, there are many researchers who use deep learning to resolve problems in other fields. They integrate software or algorithm source codes and upload them to the Internet for everyone to learn and use. We can directly use their software or their algorithms to expand our understanding of deep learning. We present these software packages and source codes in Table 2.

All of the applications we have listed are verified and available. In terms of the statistical analytical ability of these applications, the application of CNN is more extensive and the application of RNN is still small. In addition, using combined models, such as CNN + RNN, often improves performance.

Resolving omics problems using deep learning

In this section, we summarize several ways of using deep learning technology to resolve an omics problem, including data acquisition, encoding, data preprocessing, model selection, model training and performance evaluation.

Data acquisition

A large amount of omics data is produced every year. Furthermore, with the establishment of various bioinformatics databases, data acquisition is no longer a difficult problem. Table 3 presents several commonly used omics databases in omics research.

Table 3

Some commonly used omics databases

CategoryDatabase nameWebsite
Genome databaseNCBIhttps://www.ncbi.nlm.nih.gov/genome
Ensemblhttps://www.ensembl.org/
USUChttp://genome.usuc.edu/
Nucleic acid sequence databaseEMBIhttp://www.ebi.ac.uk/embl/
GenBankhttps://www.ncbi.nlm.nih.gov/genbank/
DDBJhttp://www.ddbj.nig.ac.jp
Protein sequence databaseSWISS—PROThttp://cn.expasy.org/sprot
PIRhttp://pir.georgetown.edu/
Protein structure databasePDBhttp://www.rcsb.org/pdb
Protein structure classification databaseSCOPhttp://scop.mrc-lmb.cam.ac.uk/scop/
CATHhttp://www.cathdb.info/
CategoryDatabase nameWebsite
Genome databaseNCBIhttps://www.ncbi.nlm.nih.gov/genome
Ensemblhttps://www.ensembl.org/
USUChttp://genome.usuc.edu/
Nucleic acid sequence databaseEMBIhttp://www.ebi.ac.uk/embl/
GenBankhttps://www.ncbi.nlm.nih.gov/genbank/
DDBJhttp://www.ddbj.nig.ac.jp
Protein sequence databaseSWISS—PROThttp://cn.expasy.org/sprot
PIRhttp://pir.georgetown.edu/
Protein structure databasePDBhttp://www.rcsb.org/pdb
Protein structure classification databaseSCOPhttp://scop.mrc-lmb.cam.ac.uk/scop/
CATHhttp://www.cathdb.info/
Table 3

Some commonly used omics databases

CategoryDatabase nameWebsite
Genome databaseNCBIhttps://www.ncbi.nlm.nih.gov/genome
Ensemblhttps://www.ensembl.org/
USUChttp://genome.usuc.edu/
Nucleic acid sequence databaseEMBIhttp://www.ebi.ac.uk/embl/
GenBankhttps://www.ncbi.nlm.nih.gov/genbank/
DDBJhttp://www.ddbj.nig.ac.jp
Protein sequence databaseSWISS—PROThttp://cn.expasy.org/sprot
PIRhttp://pir.georgetown.edu/
Protein structure databasePDBhttp://www.rcsb.org/pdb
Protein structure classification databaseSCOPhttp://scop.mrc-lmb.cam.ac.uk/scop/
CATHhttp://www.cathdb.info/
CategoryDatabase nameWebsite
Genome databaseNCBIhttps://www.ncbi.nlm.nih.gov/genome
Ensemblhttps://www.ensembl.org/
USUChttp://genome.usuc.edu/
Nucleic acid sequence databaseEMBIhttp://www.ebi.ac.uk/embl/
GenBankhttps://www.ncbi.nlm.nih.gov/genbank/
DDBJhttp://www.ddbj.nig.ac.jp
Protein sequence databaseSWISS—PROThttp://cn.expasy.org/sprot
PIRhttp://pir.georgetown.edu/
Protein structure databasePDBhttp://www.rcsb.org/pdb
Protein structure classification databaseSCOPhttp://scop.mrc-lmb.cam.ac.uk/scop/
CATHhttp://www.cathdb.info/

It is important to note that omics data have its own industry standard. For example, ‘fasta,’ ‘fastq,’ ‘gff2,’ ‘bed,’ etc. are common data formats in omics. Obviously, it is difficult to directly apply these data formats to deep learning. However, we can easily find detailed descriptions of these data types on the Internet. It may be necessary to know some scripting languages, such as Perl, R or Python to extract the information we need from such data. Fortunately, the learning costs of learning these scripting languages are often low.

In omics, the input data for common deep learning models include sequencing data (DNA sequencing data, RNA sequencing data and amino acid sequencing data), gene expression data, image data (such as in situ hybridization images), the physicochemical properties of proteins or amino acids, contact maps, etc. Overall, it is necessary to download, extract and tidy these data up into a form that deep learning models can understand (such as vectors and matrices).

Data preprocessing

Although deep learning models can automatically learn the features of data, this does not mean that the original data can always be directly input into a deep learning model. Proper preprocessing of the data can greatly improve the accuracy and speed of the deep learning model. The most commonly used data preprocessing methods in the omics research include data cleaning, normalization and dimensionality reduction.

(1) Data cleaning: The omics data that are obtained may contain a lot of missing values, error values and noise, which might cause serious issues in model training. Therefore, we need to improve the quality of the data as much as possible. Data cleansing is usually undertaken prior to the encoding operation. Data cleansing mainly involves handling missing values and outliers, removing duplicate data and processing noise data. For missing data or outliers, we can fill in the incomplete data by using the k-nearest neighbor algorithm, regression, decision tree analysis and other methods. We can deal with noise data by applying clustering, regression and binning. For duplicate data, we can eliminate data whose similarity is greater than the threshold.

Data cleansing is a time-consuming and labor-intensive task, and it is difficult to judge which method is the best because of the different types of data involved. Many omics researchers may not understand the fundamentals of machine learning and implementing the various algorithms mentioned above. Fortunately, some software packages, such as OpenRefine [124] and DataKleenr, can be good for data cleansing. It is much easier to learn to use these software packages than to implement the cleansing algorithm by ourselves.

(2) Normalization: Normalization involves limiting the collected data to a certain extent. A good normalization method can alleviate the problem of falling into local optimum. Normally, normalization is undertaken after the data are encoded. We introduce two common normalization methods: min-max normalization and zero-mean normalization. The normalization method that we choose depends on the type of data that we need to deal with.

When the numerical values of our omics data are centralized and do not conform to normal distribution, and when, in the process of calculation, distance operation and covariance operations are not involved, we can use the first normalization method. For example, we can use this method to process image data, such as in situ hybridization images. We perform the following operations on the data for each dimension:
(1)
where max represents the maximum of the sample data, and min represents the minimum of the sample data.
In some cases, the omics data that we obtain are within normal distribution; however, we do not want to undermine the distribution, and the distance between samples is also important for the classification effect. In this case, we generally would adopt the 2nd normalization method; in fact, this normalization method is also the most commonly used. For example, the physicochemical properties of amino acids and gene expression profiling data often take this normalization approach. The calculation of the zero-mean normalization is as follows:
(2)
where u is the mean of all samples and σ is the standard deviation of all samples.

(3) Dimensionality reduction: In general, omics data are usually high dimensional. Proper reduction of data dimensions can remove some irrelevant features for better training. Many deep learning models, such as automatic encoder, have the function of dimensionality reduction. Compared with traditional machine learning methods, dimensionality reduction by deep learning can retain more non-linear features. In addition, sometimes, in order to reduce the amount of computation, some conventional dimensionality reduction methods are also used in omics research. For example, principal component analysis (PCA) was used to reduce the dimensions of gene expression profile data in [79].

Encoding

Reasonable data entry forms have profound implications for the ultimate learning outcomes of deep learning models. In general, the most accepted form of data entry for deep learning or conventional machine learning methods is the form of vectors or matrices. In omics, the most common types of data are sequence data, such as DNA sequences, RNA sequences and amino acid sequences. For sequence data, we often use the following three methods to encode them into the form of a matrix:

  1. One hot encoding: This is the most common coding method in the existent literature on this topic. This encoding method can be used for both nucleotide and amino acid sequences. In the case of a DNA sequence, the sequence ATGCT after one hot encoding is shown in Figure 6A, where the black pattern represents 1 and the white pattern represents 0.

  2. Position-specific scoring matrix (PSSM): This encoding method can be used to encode amino acid sequences and nucleotide sequences. The matrix can show the probability of a base or a certain amino acid present at a specific position. Some software packages, such as position-specific iterative basic local alignment search tool (PSI-BLAST), can generate a PSSM matrix. A simple PSSM matrix is shown in Figure 6B.

  3. PAM matrix and BLOSUM matrix: The point accepted mutation (PAM) matrix and the blocks substitution matrix (BLOSUM) are scoring matrices for sequence similarities. These two encoding methods are mainly used for encoding amino acid sequences. Some experiments, such as [107], have compared these two encoding methods in detail. Currently, the BLOSUM Matrix is the most frequently used of the two methods. Some mature tools, such as BLAST [125], provide good support for both matrices.

Two common encoding methods. (A) One hot encoding of base, where black blocks represent 1, and white blocks represent 0. (B) A simple PSSM matrix, the number represents the probability that the base appears at that location.
Figure 6

Two common encoding methods. (A) One hot encoding of base, where black blocks represent 1, and white blocks represent 0. (B) A simple PSSM matrix, the number represents the probability that the base appears at that location.

In addition to these three common coding methods, there are also some coding methods that encode protein sequences, such as the autocovariance method and the conjoint triad method [126]. Among them, the autocovariance method, which describes how variables at different places are correlated and interact, has been widely used for coding protein sequences [127].

In addition to this sequence data, some omics data, such as contact maps and image data, take the form of a two-dimensional matrix and can be used directly by the deep learning model. Some data, such as gene expression data and the physical and chemical properties of proteins or amino acids, take the form of numerical vectors. It is simple to integrate these into a matrix.

The joint encoding of a variety of data types is often used in omics research. For example, the Atchley factors method [128] is one of the most frequently used such methods. This method can characterize an amino acid by joining five numerals, including secondary structure, polarity, volume, codon diversity and electrostatic charge. It is also common to combine the various physicochemical properties of amino acids and the PSSM matrix into one matrix [49, 50, 98, 129].

Model selection

So far, no universal deep learning model that can solve all problems has been developed. As mentioned above, each model has its own advantages. In omics research, auto-encoders are mainly used for dimensionality reduction and data noise removal. RBM is mainly used for feature learning. Moreover, these two deep learning models are rarely used alone, and they are combined with other models to solve a problem. MLP and DBN are suitable for almost all omics problems, but there is no guarantee that their final effect will be better than other models. CNNs can handle most of the grid-like data in omics, such as image data and encoded sequence data. RNNs are mainly used to resolve sequence learning problems. As for a detailed understanding of the applicable models for different omics problems, please refer to our discussion in the section Application of deep learning in omics.

It is worth mentioning that although the deep learning model we introduced in Section II is the most commonly used model in omics research, we should not rely on any single model when selecting a deep learning model. First, we must learn to use the advantages of various models and combine them to build a more powerful network. For example, the method used in [41] was a combined model of CNN + LSTM to predict the properties and functions of DNA sequences. This method achieved impressive results. Second, we should pay more attention to new technology. For example, in 2015, a more advanced deep residual learning method [130] was proposed. This network can avoid the problem of the disappearance of the gradient while increasing the number of layers in the network, thus facilitating the achievement of greater accuracy. We could boldly try this deep learning model in omics research.

Model training

The training of the deep learning model has never been simple. In order to mitigate the challenges involved, it is first important to consider the nature of the hardware involved. It may take a long time to train a network due to the huge amount of parameters of a deep learning model. When training a large network, a graphics processing unit (GPU) is recommended for accelerating the model training process. Many deep learning frameworks have begun to support GPU acceleration.

Second, we should pay attention to the allocation of datasets. In general, the samples are divided into three parts: a training set, a validation set and a testing set. The training set is used to train the model. The validation set is used to determine the network structure or the parameters that control the complexity of the model. The testing set is used to test the performance of the model. In general, we will use 70% of the data samples to undertake training and verification and 30% of the data samples for testing. However, this proportion is not absolute, and it is possible to make appropriate adjustments according to the sizes of the samples.

The choice of various functions in the network is also a key issue when building models. Common activation functions, such as sigmoid, tanh, softmax, ReLU, maxout and common loss functions (including the mean square error loss function, the log-likelihood loss function and the cross-entropy loss function), are all important factors in determining the final effect of the training. The activation function can be divided into two categories: the output layer activation function and the hidden layer output function. First, it is important to consider the output layer. It is usually sufficient to achieve a simple regression task when we use a simple linear function as an activation function and when we use the mean square error loss function as the loss function. When we want to achieve a binary classification task, we usually use sigmoid/tanh as activation function, with the cross-entropy loss function as the loss function. When we undertake multi-class classification tasks, we usually use softmax as the activation function and the log-likelihood loss function as the loss function. Then, it is important to consider the activation function of the hidden layer. Due to gradient disappearance or the gradient explosion problem occurring in sigmoid and tanh, now ReLU and maxout have become the most widely used activation functions.

The dropout technique is used in a considerable proportion of omics research. If the aim is to train a large network with very little training data or your data contain a lot of noise, it is easy to cause overfitting. In order to resolve this problem, Hinton [131] proposed the dropout technique. In each training session, some neuron units are temporarily discarded from the network according to a certain probability so that the generalization ability of the network is improved. The principle, though simple, is highly effective and worth trying. Of course, in addition to the dropout technique, there are many ways to prevent overfitting, such as early stopping and weight decay. These methods do not conflict and can be superimposed.

In addition, there are still many parameters in the deep learning model that need to be adjusted and set by ourselves, such as the learning rate, weight initialization, etc. We have summarized some of the common parameter settings in Table 4. In addition, many algorithms support the automatic tuning of hyperparameters, such as grid search, random search and Bayesian optimization [132], which help us to alleviate the difficulty of adjusting the parameters to a certain extent.

Table 4

Some common hyper parameters settings

NameCommon settings
Learning rateInitial value = 0.1, and use Adam to dynamic adjustment
Parameter optimization methodSGD; momentum; Adagrad; Adadelta; RMSprop; Adam
Weight initializationGaussian; Xavier; MSRA
Batch size64; 128; 256
Number of nodesSuch as 16, 32, 128 No more than the number of samples
Dropout rate0.3; 0.5; 0.7
NameCommon settings
Learning rateInitial value = 0.1, and use Adam to dynamic adjustment
Parameter optimization methodSGD; momentum; Adagrad; Adadelta; RMSprop; Adam
Weight initializationGaussian; Xavier; MSRA
Batch size64; 128; 256
Number of nodesSuch as 16, 32, 128 No more than the number of samples
Dropout rate0.3; 0.5; 0.7

Note: bold font represents common values

Table 4

Some common hyper parameters settings

NameCommon settings
Learning rateInitial value = 0.1, and use Adam to dynamic adjustment
Parameter optimization methodSGD; momentum; Adagrad; Adadelta; RMSprop; Adam
Weight initializationGaussian; Xavier; MSRA
Batch size64; 128; 256
Number of nodesSuch as 16, 32, 128 No more than the number of samples
Dropout rate0.3; 0.5; 0.7
NameCommon settings
Learning rateInitial value = 0.1, and use Adam to dynamic adjustment
Parameter optimization methodSGD; momentum; Adagrad; Adadelta; RMSprop; Adam
Weight initializationGaussian; Xavier; MSRA
Batch size64; 128; 256
Number of nodesSuch as 16, 32, 128 No more than the number of samples
Dropout rate0.3; 0.5; 0.7

Note: bold font represents common values

Performance evaluation

K-fold cross validation is usually the 1st step involved in checking the accuracy of an algorithm. Take 10-fold cross-validation as an example: the dataset is divided into 10 parts. Nine of them are taken as training data and one is used as testing data. Each part of the data is taken as a testing set and is tested 10 times in total. The average of the correct rate of the 10 results is used as an estimate of the accuracy of the algorithm.

After this step, there are many criteria for measuring the performance of a deep learning model, such as accuracy, F1-measure, etc. We summarize the most commonly used criteria for omics research and deep learning, as well as their computational methods, in Table S1 in the Supplementary Materials.

Deep learning framework

Although deep learning technology has been shown to have many great advantages, especially in omics research, it is difficult for an omics researcher who does not have a computing background to learn the necessary skills for its application. Fortunately, with the great success of deep learning technology, many well-known companies and institutions, such as Google and Microsoft, have released deep learning frameworks. We just need to learn how to build a model based on these frameworks, which is much simpler than programming a neural network by ourselves.

There are many frameworks which are now publicly available. Choosing a framework that suits our purpose can greatly increase productivity. We first summarize the properties of some common frameworks in Table 5. The characteristics of some frameworks are shown in Table 6. Therefore, we can initially determine which framework is suitable.

Table 5

Comparison of deep learning libraries

NameCaffeMXNetTorchDeeplearning4jTensorflowTheanoCNTKNeonKeras
CreatorUC BerkeleyCMU, UW and MicrosotRonan Collobert et al.SkymindGoogleUniversite de MontrealMicrosoftNervana SystemFranois Chollet
InterfaceC++,
Python,
MATLAB
C++, R,Python, Scala, Matlab,JavaScript, Go, JuliaLua, LuaJIT, CJava, Scala, ClojureC++, Python, GO,
Java
PythonNDL, C++, PythonPythonPython
Suitabe modelCNN,
RNN
CNN,
RNN
DNN,
CNN, RNN
DNN, CNN,
RNN
DNN,
CNN,
RNN
DNN,
CNN,
RNN
CNN,
RNN
DNN, CNN,
RNN
DNN,
CNN,
RNN
OSLinux, Win, OSX, Andr.Linux, Win,
OSX, Andr.
Linux, Win, OSX, Andr., iOSLinux, Win, OSX, Andr.Linux,
OSX,
Win
Linux,
OSX,
Win
Linux, OSX,
Win
OSX, LinuxLinux, Win, OSX
Stars in github20212111707279720368800691412396320019589
Multi-GPU××
Distributed××××
Cloud copmuting××××××
NameCaffeMXNetTorchDeeplearning4jTensorflowTheanoCNTKNeonKeras
CreatorUC BerkeleyCMU, UW and MicrosotRonan Collobert et al.SkymindGoogleUniversite de MontrealMicrosoftNervana SystemFranois Chollet
InterfaceC++,
Python,
MATLAB
C++, R,Python, Scala, Matlab,JavaScript, Go, JuliaLua, LuaJIT, CJava, Scala, ClojureC++, Python, GO,
Java
PythonNDL, C++, PythonPythonPython
Suitabe modelCNN,
RNN
CNN,
RNN
DNN,
CNN, RNN
DNN, CNN,
RNN
DNN,
CNN,
RNN
DNN,
CNN,
RNN
CNN,
RNN
DNN, CNN,
RNN
DNN,
CNN,
RNN
OSLinux, Win, OSX, Andr.Linux, Win,
OSX, Andr.
Linux, Win, OSX, Andr., iOSLinux, Win, OSX, Andr.Linux,
OSX,
Win
Linux,
OSX,
Win
Linux, OSX,
Win
OSX, LinuxLinux, Win, OSX
Stars in github20212111707279720368800691412396320019589
Multi-GPU××
Distributed××××
Cloud copmuting××××××

Notes: In this table, while the various frameworks support different deep learning models, each framework is good for different models. In terms of CNN modeling capabilities, Caffe is the best. In terms of RNN modeling capabilities, CNTK is the best.

Table 5

Comparison of deep learning libraries

NameCaffeMXNetTorchDeeplearning4jTensorflowTheanoCNTKNeonKeras
CreatorUC BerkeleyCMU, UW and MicrosotRonan Collobert et al.SkymindGoogleUniversite de MontrealMicrosoftNervana SystemFranois Chollet
InterfaceC++,
Python,
MATLAB
C++, R,Python, Scala, Matlab,JavaScript, Go, JuliaLua, LuaJIT, CJava, Scala, ClojureC++, Python, GO,
Java
PythonNDL, C++, PythonPythonPython
Suitabe modelCNN,
RNN
CNN,
RNN
DNN,
CNN, RNN
DNN, CNN,
RNN
DNN,
CNN,
RNN
DNN,
CNN,
RNN
CNN,
RNN
DNN, CNN,
RNN
DNN,
CNN,
RNN
OSLinux, Win, OSX, Andr.Linux, Win,
OSX, Andr.
Linux, Win, OSX, Andr., iOSLinux, Win, OSX, Andr.Linux,
OSX,
Win
Linux,
OSX,
Win
Linux, OSX,
Win
OSX, LinuxLinux, Win, OSX
Stars in github20212111707279720368800691412396320019589
Multi-GPU××
Distributed××××
Cloud copmuting××××××
NameCaffeMXNetTorchDeeplearning4jTensorflowTheanoCNTKNeonKeras
CreatorUC BerkeleyCMU, UW and MicrosotRonan Collobert et al.SkymindGoogleUniversite de MontrealMicrosoftNervana SystemFranois Chollet
InterfaceC++,
Python,
MATLAB
C++, R,Python, Scala, Matlab,JavaScript, Go, JuliaLua, LuaJIT, CJava, Scala, ClojureC++, Python, GO,
Java
PythonNDL, C++, PythonPythonPython
Suitabe modelCNN,
RNN
CNN,
RNN
DNN,
CNN, RNN
DNN, CNN,
RNN
DNN,
CNN,
RNN
DNN,
CNN,
RNN
CNN,
RNN
DNN, CNN,
RNN
DNN,
CNN,
RNN
OSLinux, Win, OSX, Andr.Linux, Win,
OSX, Andr.
Linux, Win, OSX, Andr., iOSLinux, Win, OSX, Andr.Linux,
OSX,
Win
Linux,
OSX,
Win
Linux, OSX,
Win
OSX, LinuxLinux, Win, OSX
Stars in github20212111707279720368800691412396320019589
Multi-GPU××
Distributed××××
Cloud copmuting××××××

Notes: In this table, while the various frameworks support different deep learning models, each framework is good for different models. In terms of CNN modeling capabilities, Caffe is the best. In terms of RNN modeling capabilities, CNTK is the best.

Table 6

Advantages and disadvantages of some deep learning frameworks

NameAdvantagesDisadvantages
TensorFlow1. Flexible portability;2. Fast compilation speed;
3. Powerful supporting software, such as TensorFlow Serving;
4. Strong technical support services;
5. Excellent overall architecture.
1. The documentation and interfaces are insufficiently clear;
2. Debugging difficulties;
3. High memory footprint.
Caffe1. Easy to use;
2. Training speed is sufficiently fast;
3. Highly modular of components.
1. Support for RNN is not quite adequate;
2. Different versions of the interface are not compatible;
3. No support for distributed training.
Keras1. Documentation is complete;
2. Easy to learn and easy to use;
3. Updates quickly;
4. Highly modular of components.
1. Insufficiently flexible;
2. Cannot directly use a multi-GPU.
Theano1. High level of flexibility;
2. Low cost of API interface learning;
3. Good computational stability.
1. It is difficult to learn how to use it;
2. It does not have the underlying C++ interface;
3. The model is very inconvenient to deploy;
4. The debugging error message is very difficult to understand.
Torch1. Easy to use;
2. Highly modular of components;
3. It is convenient for use with GPUs;
4. The high efficiency of the Lua programming language.
1. The Lua programming language is not commonly used;
2. Torch’s data file format is special and needs conversion.
MXNet1. Good performance;
2. High level of flexibility;
3. Saves memory;
4. Supports many language packages.
1. Has poor API documentation;
2. It is difficult to learn how to use it.
CNTK1. Very good performance;
2. Good scalability.
1. It is difficult to install;
2. It has less learning materials than other frameworks.
NameAdvantagesDisadvantages
TensorFlow1. Flexible portability;2. Fast compilation speed;
3. Powerful supporting software, such as TensorFlow Serving;
4. Strong technical support services;
5. Excellent overall architecture.
1. The documentation and interfaces are insufficiently clear;
2. Debugging difficulties;
3. High memory footprint.
Caffe1. Easy to use;
2. Training speed is sufficiently fast;
3. Highly modular of components.
1. Support for RNN is not quite adequate;
2. Different versions of the interface are not compatible;
3. No support for distributed training.
Keras1. Documentation is complete;
2. Easy to learn and easy to use;
3. Updates quickly;
4. Highly modular of components.
1. Insufficiently flexible;
2. Cannot directly use a multi-GPU.
Theano1. High level of flexibility;
2. Low cost of API interface learning;
3. Good computational stability.
1. It is difficult to learn how to use it;
2. It does not have the underlying C++ interface;
3. The model is very inconvenient to deploy;
4. The debugging error message is very difficult to understand.
Torch1. Easy to use;
2. Highly modular of components;
3. It is convenient for use with GPUs;
4. The high efficiency of the Lua programming language.
1. The Lua programming language is not commonly used;
2. Torch’s data file format is special and needs conversion.
MXNet1. Good performance;
2. High level of flexibility;
3. Saves memory;
4. Supports many language packages.
1. Has poor API documentation;
2. It is difficult to learn how to use it.
CNTK1. Very good performance;
2. Good scalability.
1. It is difficult to install;
2. It has less learning materials than other frameworks.
Table 6

Advantages and disadvantages of some deep learning frameworks

NameAdvantagesDisadvantages
TensorFlow1. Flexible portability;2. Fast compilation speed;
3. Powerful supporting software, such as TensorFlow Serving;
4. Strong technical support services;
5. Excellent overall architecture.
1. The documentation and interfaces are insufficiently clear;
2. Debugging difficulties;
3. High memory footprint.
Caffe1. Easy to use;
2. Training speed is sufficiently fast;
3. Highly modular of components.
1. Support for RNN is not quite adequate;
2. Different versions of the interface are not compatible;
3. No support for distributed training.
Keras1. Documentation is complete;
2. Easy to learn and easy to use;
3. Updates quickly;
4. Highly modular of components.
1. Insufficiently flexible;
2. Cannot directly use a multi-GPU.
Theano1. High level of flexibility;
2. Low cost of API interface learning;
3. Good computational stability.
1. It is difficult to learn how to use it;
2. It does not have the underlying C++ interface;
3. The model is very inconvenient to deploy;
4. The debugging error message is very difficult to understand.
Torch1. Easy to use;
2. Highly modular of components;
3. It is convenient for use with GPUs;
4. The high efficiency of the Lua programming language.
1. The Lua programming language is not commonly used;
2. Torch’s data file format is special and needs conversion.
MXNet1. Good performance;
2. High level of flexibility;
3. Saves memory;
4. Supports many language packages.
1. Has poor API documentation;
2. It is difficult to learn how to use it.
CNTK1. Very good performance;
2. Good scalability.
1. It is difficult to install;
2. It has less learning materials than other frameworks.
NameAdvantagesDisadvantages
TensorFlow1. Flexible portability;2. Fast compilation speed;
3. Powerful supporting software, such as TensorFlow Serving;
4. Strong technical support services;
5. Excellent overall architecture.
1. The documentation and interfaces are insufficiently clear;
2. Debugging difficulties;
3. High memory footprint.
Caffe1. Easy to use;
2. Training speed is sufficiently fast;
3. Highly modular of components.
1. Support for RNN is not quite adequate;
2. Different versions of the interface are not compatible;
3. No support for distributed training.
Keras1. Documentation is complete;
2. Easy to learn and easy to use;
3. Updates quickly;
4. Highly modular of components.
1. Insufficiently flexible;
2. Cannot directly use a multi-GPU.
Theano1. High level of flexibility;
2. Low cost of API interface learning;
3. Good computational stability.
1. It is difficult to learn how to use it;
2. It does not have the underlying C++ interface;
3. The model is very inconvenient to deploy;
4. The debugging error message is very difficult to understand.
Torch1. Easy to use;
2. Highly modular of components;
3. It is convenient for use with GPUs;
4. The high efficiency of the Lua programming language.
1. The Lua programming language is not commonly used;
2. Torch’s data file format is special and needs conversion.
MXNet1. Good performance;
2. High level of flexibility;
3. Saves memory;
4. Supports many language packages.
1. Has poor API documentation;
2. It is difficult to learn how to use it.
CNTK1. Very good performance;
2. Good scalability.
1. It is difficult to install;
2. It has less learning materials than other frameworks.

Speed is also an important factor in measuring the performance of a deep learning framework. Many researchers have already analyzed and compared the performance of several frameworks [133, 134]. In general, different frameworks have different strengths in different network models and different hardware conditions. In terms of overall performance, CNTK and MXNet may perform better than other frameworks.

Challenges and opportunities

In the research of omics, deep learning technology has faced the following difficulties:

Data volume: Deep learning models need more data than traditional models to avoid overfitting. If the amount of data is too small, the effect of deep learning models may be worse than traditional machine learning algorithms. For omics data, although a large amount of data can be generated every year, there will still be the problem of insufficient data. For example, due to privacy or sample size limitations, the data on the gene expression profiles of some diseases are still limited and may not be sufficient for deep learning models. Furthermore, data of this type further aggravate the issue of data imbalance, resulting in inaccurate training effects.

Data quality: The essence of deep learning technology is to learn certain rules according to the input data; therefore, the quality of learning depends on the quality of the input data. However, most of the data in omics are obtained through experiments, and it is difficult to ensure that the data are accurate.

Computation costs: The structure of a deep learning model is complex. There are countless parameters, and training parameters need to go through forward propagation, backpropagation and a series of complex processes. Therefore, deep learning needs strong computing power. Training a deep learning model requires at least one server with multiple GPU cards, and many purely biological labs or individuals may not have such conditions.

The ‘black box’ problem: The black-box-like algorithm obtained by deep learning cannot be easily understood by most people. It cannot be proven or falsified by mathematical methods. Therefore, sometimes we cannot understand the real principles even if we obtain the correct results. For example, we can use a deep learning model to identify a gene expression profile as a cancer gene expression profile, but this model does not explain why this expression profile is the expression profile of cancer. In omics, this ability to explain is crucial to advance the discipline’s development.

Model selection and training: Due to the rapid development of deep learning technology, there are many models from which we can choose, and it is often not easy to choose a model that is suitable for resolving one’s specific problems. In addition, it is difficult to set and adjust the hyperparameters, and very small changes in hyperparameters are likely to change the effect of the training.

Although we have mentioned many deficiencies, we do not need to be pessimistic about deep learning technology. After all, the many successful applications thereof, as mentioned above, have proven the usability of this technology in the field of omics. In terms of the problem of insufficient sample data, some new technologies, such as zero-shot learning [135], one-shot learning [136] and generative adversarial networks [137], can resolve it to a certain extent. In terms of the imbalance in omics data, some methods, such as resampling, cost-sensitive learning [138], etc., also provide us with solutions. In terms of the problem of poor data quality, we can also use the data cleansing method explained above to improve the quality of data. For mitigating the ‘black box’ problem in deep learning technology, due to the development of technology, it is also very promising to convert the black box into the white box. For example, Lanchantin et al. [139] proposed a toolkit called the Deep Motif Dashboard, which provides a suite of visualization strategies to extract motifs or sequence patterns from DNN models for TFBS classification. This method can resolve the ‘black box’ problem to a certain extent. Besides, for the other problems mentioned above, such as the amount of computation involved and the difficulty in adjusting hyperparameters, we can also resolve these problems through cooperation between different organizations and between experts in different fields.

In terms of future development, reinforcement learning [140], incremental learning [141] and transfer learning [142] will be increasingly applied in the research of deep learning and omics. Reinforcing learning which is closer to human learning will greatly improve the self-learning ability of artificial intelligence; however, at present, the idea of reinforcement learning is mainly used in robotics [143]. Incremental learning mainly resolves the problem of repetitive training. A large amount of omics data is produced every year, and the total amount of data is getting larger. Retraining all the data is time-consuming, and storing historical data also consume space resources. Fortunately, incremental learning can be a good solution to this problem. Transfer learning will greatly alleviate the problem of the few samples of omics data, and this train of thought can greatly save training time.

Conclusion

In conclusion, deep learning technology is certainly suitable for resolving omics problems. The combination of omics and deep learning is new. For this reason, we have summarized the relevant recent work and have made a guideline for this topic. We have introduced the deep learning models that are commonly used in omics research and have summarized some recent omics research. In addition, we have discussed the steps of using deep learning technology and some well-known deep learning frameworks which have, until now, not been systematically discussed in the existent literature on this topic. A researcher who is interested in this field can gain a general understanding of deep learning by using our survey. Although deep learning technology does have limitations in its application to omics, these are being resolved. In the future, deep learning technology will play an increasingly important role in this field.

Key Points

  • With the advent of big data era, a huge amount of high dimensional and complex structured omics data has rendered conventional machine learning algorithms powerless. Fortunately, deep learning technology can contribute toward resolving these challenges.

  • We introduce several deep learning models and discuss several research areas which have combined omics and deep learning in recent years.

  • Furthermore, we summarize the general steps involved in using deep learning which have not yet been systematically discussed in the existent literature on this topic.

  • The features of some mainstream deep learning frameworks are also discussed in detail in this article. In addition, we also put forward our own opinions on the opportunities and challenges of deep learning in the research of omics.

  • In general, our review provides a very detailed guideline for omics researchers about deep learning technology.

Funding

This work was supported by National Key R&D Program of China (2018YFC090002, 2017YFB0202602, 2017YFC1311003, 2016YFC1302500, 2016YFB0200400, 2017YFB0202104); National Natural Science Foundation of China (NSFC) (61772543, U1435222, 61625202, 61272056); the Funds of State Key Laboratory of Chemo/Biosensing and Chemometrics; the Fundamental Research Funds for the Central Universities; Guangdong Provincial Department of Science and Technology (2016B090918122).

Zhiqiang Zhang is a master graduate student in National University of Defense Technology. His research insterests include bioinformatics, high performance computing, and artificial intelligence.

Yi Zhao is a professor in Chinese Academy of Sciences. His research interests include bioinformatics, intelligent information processing.

Xiangke Liao is a professor in National University of Defense Technology and the member of China Engineering Academy. His research interests include machine learning, large-scale scientific computing, and quantum computing.

Wenqiang Shi is a PhD student who just graduated last year from National Defense University of Science and Technology, also study in University of British Columbia for some time. His research interests include bioinformatics and artificial intelligence.

Kenli Li is a professor in Hunan University. His research interests include cloud computing, biological computing, and big data management.

Quan Zou is a professor in Tianjin University and University of Electronic Science and Technology of China. He is the senior member of IEEE and ACM. His research interests include bioinformatics, scale data mining and parallel computing application.

Shaoliang Peng is a professor in National University of Defense Technology and Hunan University. He is also the executive director of the National Supercomputing Center in Changsha. His research interests include high performance computing, bioinformatics, big data, virtual screening, and biology simulation.

References

1.

McCulloch
WS
,
Pitts
W
.
A logical calculus of the ideas immanent in nervous activity
.
Bull Math Biophys
1943
;
5
:
115
33
.

2.

Rosenblatt
F
.
The perceptron: a probabilistic model for information storage and organization in the brain
.
Psychol Rev
1958
;
65
:
386
408
.

3.

Werbos
P
.
Beyond regression: new tools for prediction and analysis in the behavioral science
.
PhD diss., Harvard University
.
1974
;
29
:
65
78
.

4.

Hinton
GE
,
Osindero
S
,
Teh
Y-W
.
A fast learning algorithm for deep belief nets
.
Neural Comput
2006
;
18
:
1527
54
.

5.

Miotto
R
,
Li
L
,
Kidd
BA
, et al.
Deep patient: an unsupervised representation to predict the future of patients from the electronic health records
.
Sci Rep
2016
;
6
:
26094
.

6.

Cheng
Y
,
Wang
F
,
Zhang
P
, et al.
Risk prediction with electronic health records: a deep learning approach
. In:
Siam International Conference on Data Mining
.
New York, USA
:
ACM
,
2016
,
pp.
432
40
.

7.

Gulshan
V
,
Peng
L
,
Coram
M
, et al.
Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs
.
Jama
2016
;
316
:
2402
.

8.

Plis
SM
,
Hjelm
DR
,
Salakhutdinov
R
, et al.
Deep learning for neuroimaging: a validation study
.
Front Neurosci
2014
;
8
:
229
.

9.

Suk
HI
,
Shen
D
.
Deep learning-based feature representation for AD/MCI classification
. In:
MICCAI International Conference on Medical Image Computing & Computer-assisted Intervention
.
Tokyo, Japan
:
Springer
,
2013
, pp.
583
.

10.

Shen
D
,
Wu
G
,
Suk
H
.
Deep learning in medical image analysis
.
Annu Rev Biomed Eng
2017
;
19
:
221
.

11.

Meng
H
,
Yue
Z
.
Classification of electrocardiogram signals with deep belief networks
. In:
IEEE International Conference on Computational Science and Engineering
.
Los angeles, USA
:
IEEE Computer Society
,
2015
, pp.
7
12
.

12.

Stober
S
,
Cameron
DJ
,
Grahn
JA
. Using convolutional neural networks to recognize rhythm stimuli from electroencephalography recordings. In:
Advances in Neural Information Processing Systems
,
Massachusetts, USA
:
MIT Press
,
2014
,
1449
57
.

13.

An
X
,
Kuang
D
,
Guo
X
, et al.
A deep learning method for classification of EEG data based on motor imagery
. In:
International Conference on Intelligent Computing
.
Hong Kong, China
:
Springer
,
2014
, pp.
203
10
.

14.

Consortium
EP
.
The ENCODE (ENCyclopedia of DNA elements) project
.
Science
2004
;
306
:
636
40
.

15.

Barrett
T
,
Wilhite
SE
,
Ledoux
P
, et al.
NCBI GEO: archive for functional genomics data sets—update
.
Nucleic Acids Res
2012
;
41
:
D991
5
.

16.

Angermueller
C
,
Pärnamaa
T
,
Parts
L
, et al.
Deep learning for computational biology
.
Mol Syst Biol
2016
;
12
:
878
.

17.

Mamoshina
P
,
Vieira
A
,
Putin
E
, et al.
Applications of deep learning in biomedicine
.
Mol Pharm
2016
;
13
:
1445
54
.

18.

Min
S
,
Lee
B
,
Yoon
S
.
Deep learning in bioinformatics
.
Brief Bioinform
2016
;
18
(5)
:
851
69
.

19.

Miotto
R
,
Wang
F
,
Wang
S
, et al.
Deep learning for healthcare: review, opportunities and challenges
.
Brief Bioinform
2017
;
bbx044
.

20.

Pastur-Romay
LA
,
Cedrón
F
,
Pazos
A
, et al.
Deep artificial neural networks and neuromorphic chips for big data analysis: pharmaceutical and bioinformatics applications
.
Int J Mol Sci
2016
;
17
:
1313
.

21.

Ravì
D
,
Wong
C
,
Deligianni
F
, et al.
Deep learning for health informatics
.
IEEE J Biomed Health Inform
2017
;
21
:
4
21
.

22.

Gawehn
E
,
Hiss
JA
,
Schneider
G
.
Deep learning in drug discovery
.
Mol Inform
2016
;
35
:
3
14
.

23.

Svozil
D
,
Kvasnicka
V
,
Pospichal
J
.
Introduction to multi-layer feed-forward neural networks
.
Chemometr Intell Lab Syst
1997
;
39
:
43
62
.

24.

Hinton
GE
,
Salakhutdinov
RR
.
Reducing the dimensionality of data with neural networks
.
Science
2006
;
313
:
504
7
.

25.

Hinton
GE
,
Sejnowski
TJ
.
Learning and releaming in Boltzmann machines
. In:
Parallel Distrilmted Processing
, Vol. 1.
Massachusetts, USA
:
MIT Press
,
1986
.

26.

Carreira-Perpinan
MA
,
Hinton
GE
. On contrastive divergence learning. In:
Aistats
.
2005
, pp.
33
40
.

27.

Eickholt
J
,
Cheng
J
.
Predicting protein residue–residue contacts using deep networks and boosting
.
Bioinformatics
2012
;
28
:
3066
72
.

28.

Zhang
S
,
Zhou
J
,
Hu
H
, et al.
A deep learning framework for modeling structural features of RNA-binding protein targets
.
Nucleic Acids Res
2015
;
44
:
e32
.

29.

LeCun
Y
,
Boser
B
,
Denker
JS
, et al.
Backpropagation applied to handwritten zip code recognition
.
Neural Comput
1989
;
1
:
541
51
.

30.

Williams
RJ
,
Zipser
D
.
A learning algorithm for continually running fully recurrent neural networks
.
Neural Comput
1989
;
1
:
270
80
.

31.

Hochreiter
S
,
Schmidhuber
J
.
Long short-term memory
.
Neural Comput
1997
;
9
:
1735
80
.

32.

Cho
K
,
Van Merriënboer
B
,
Bahdanau
D
, et al.
On the properties of neural machine translation: encoder-decoder approaches
.
Comput Sci
,
arXiv preprint arXiv:1409.1259
,
2014
.

33.

Quang
D
,
Xie
X
.
FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data
.
2017
bioRxiv:151274
.

34.

Liu
F
,
Ren
C
,
Li
H
, et al.
De novo identification of replication-timing domains in the human genome by deep learning
.
Bioinformatics
2015
;
32
:
641
9
.

35.

Lanchantin
J
,
Singh
R
,
Lin
Z
, et al.
Deep motif: visualizing genomic sequence classifications
,
arXiv preprint arXiv:1605.01133
,
2016
.

36.

Umarov
RK
,
Solovyev
VV
.
Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks
.
PloS One
2017
;
12
:
e0171410
.

37.

Li
Y
,
Shi
W
,
Wasserman
WW
.
Genome-wide prediction of cis-regulatory regions using supervised deep learning methods
.
BMC Bioinformatics
2018
;
19
:
202
.

38.

Wang
J
,
Ling
C
,
Gao
J
.
A high-precision shallow Convolutional Neural Network based strategy for the detection of Genomic Deletions
. In:
2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
.
Piscataway, NJ, USA
,
2016
, pp.
1806
13
.
IEEE
.

39.

Chen
Y
,
Li
Y
,
Narayan
R
, et al.
Gene expression inference with deep learning
.
Bioinformatics
2016
;
32
:
1832
9
.

40.

Xie
R
,
Quitadamo
A
,
Cheng
J
, et al.
A predictive model of gene expression using a deep learning framework
. In:
2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
.
Piscataway,NJ, USA
,
2016
, pp.
676
81
.
IEEE
.

41.

Quang
D
,
Xie
X
.
DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences
.
Nucleic Acids Res
2016
;
44
:
e107
.

42.

Raza
K
,
Alam
M
.
Recurrent neural network based hybrid model of gene regulatory network
.
Comput Sci
2014
;
24
:
522
9
.

43.

Yousefi
S
,
Amrollahi
F
,
Amgad
M
, et al.
Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models
.
Scie Rep
2017
;
7
:
11707
.

44.

Zeng
H
,
Gifford
DK
.
Predicting the impact of non-coding variants on DNA methylation
.
Nucleic Acids Res
2017
;
45
(11)
:
e99
.

45.

Angermueller
C
,
Lee
HJ
,
Reik
W
, et al.
DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning
.
Genome Biology
2017
;
18
:
67
.

46.

Thomas
J
,
Thomas
S
,
Sael
L
.
DP-miRNA: An improved prediction of precursor microRNA using deep learning model
. In:
2017 IEEE International Conference on Big Data and Smart Computing (BigComp)
.
Piscataway, NJ, USA
,
2017
, pp.
96
9
.
IEEE
.

47.

Ibrahim
R
,
Yousri
NA
,
Ismail
MA
, et al.
Multi-level gene/MiRNA feature selection using deep belief nets and active learning
. In:
2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
.
Piscataway, NJ, USA
,
2014
, pp.
3957
60
.
IEEE
.

48.

Aliper
A
,
Plis
S
,
Artemov
A
, et al.
Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data
.
Mol Pharm
2016
;
13
:
2524
30
.

49.

Heffernan
R
,
Paliwal
K
,
Lyons
J
, et al.
Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning
.
Sci Rep
2015
;
5
:
11476
.

50.

Wang
S
,
Sun
S
,
Li
Z
, et al.
Accurate de novo prediction of protein contact map by ultra-deep learning model
.
PLoS Comput Biol
2017
;
13
:
e1005324
.

51.

Tavanaei
A
,
Maida
AS
,
Kaniymattam
A
, et al.
Towards recognition of protein function based on its structure using deep convolutional networks
. In:
2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
.
Piscataway, NJ, USA
,
2016
, pp.
145
9
.
IEEE.

52.

Liu
X
.
Deep recurrent neural network for protein function prediction from sequence
.
2017
Preprint arXiv:1701.08318
.

53.

Sun
T
,
Zhou
B
,
Lai
L
, et al.
Sequence-based prediction of protein protein interaction using a deep-learning algorithm
.
BMC Bioinformatics
2017
;
18
:
277
.

54.

Pärnamaa
T
,
Parts
L
.
Accurate classification of protein subcellular localization from high-throughput microscopy images using deep learning, G3: Genes, Genomes
.
Genetics
2017
;
7
:
1385
92
.

55.

Liu
F
,
Li
H
,
Ren
C
, et al.
PEDLA: predicting enhancers with a deep learning-based algorithmic framework
.
Sci Rep
2016
;
6
:
28517
.

56.

Li
Y
,
Chen
C-Y
,
Wasserman
WW
. Deep feature selection: theory and application to identify enhancers and promoters. In:
Recomb
.
Berlin, Germany
:
Springer
,
2015
, pp.
205
17
.

57.

Yu
N
,
Yu
Z
,
Pan
Y
.
A deep learning method for lincRNA detection using auto-encoder algorithm
.
BMC Bioinformatics
2017
;
18
:
511
.

58.

Bu
H
,
Gan
Y
,
Wang
Y
, et al.
A new method for enhancer prediction based on deep belief network
.
BMC Bioinformatics
2017
;
18
:
418
.

59.

Zeng
H
,
Edwards
MD
,
Liu
G
, et al.
Convolutional neural network architectures for predicting DNA–protein binding
.
Bioinformatics
2016
;
32
:
i121
7
.

60.

Denas
O
,
Taylor
J
. Deep modeling of gene expression regulation in an erythropoiesis model. In:
Representation Learning, ICML Workshop
,
New York, USA
:
ACM
,
2013
.

61.

Shrikumar
A
,
Greenside
P
,
Kundaje
A
.
Reverse-complement parameter sharing improves deep learning models for genomics
.
2017
.
bioRxiv:103663
.

62.

Lanchantin
J
,
Singh
R
,
Lin
Z
, et al.
Deep motif: visualizing genomic sequence classifications
.
2016
.
Preprint arXiv:1605.01133
.

63.

Singh
S
,
Yang
Y
,
Poczos
B
, et al.
Predicting enhancer-promoter interaction from genomic sequence with deep neural networks
.
2016
.
bioRxiv:085241
.

64.

Hassanzadeh
HR
,
Wang
MD
.
DeeperBind: enhancing prediction of sequence specificities of DNA binding proteins
. In:
2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
.
Piscataway, NJ, USA
,
2016
, pp.
178
83
.
IEEE
.

65.

Min
X
,
Chen
N
,
Chen
T
, et al.
DeepEnhancer: predicting enhancers by convolutional neural networks
. In:
2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
.
Piscataway, NJ, USA
,
2016
, pp.
637
44
.
IEEE
.

66.

Zhang
S
,
Hu
H
,
Jiang
T
, et al.
TITER: predicting translation initiation sites by deep learning
.
Bioinformatics
2017
;
33
:
i234
42
.

67.

Zhou
J
,
Lu
Q
,
Xu
R
, et al.
CNNsite: Prediction of DNA-binding residues in proteins using Convolutional Neural Network with sequence features
. In:
2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
.
Piscataway, NJ, USA
,
2016
, pp.
78
85
.
IEEE
.

68.

Min
X
,
Zeng
W
,
Chen
S
, et al.
Predicting enhancers with deep convolutional neural networks
.
BMC Bioinformatics
2017
;
18
:
478
.

69.

Lee
B
,
Lee
T
,
Na
B
, et al.
DNA-level splice junction prediction using deep recurrent neural networks
.
Preprint arXiv:1512.05135
2015
.

70.

Yang
B
,
Liu
F
,
Ren
C
, et al.
BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone
.
Bioinformatics
2017
;
33
:
1930
36
.

71.

Tan
J
,
Hammond
JH
,
Hogan
DA
, et al.
Adage-based integration of publicly available pseudomonas aeruginosa gene expression data with denoising autoencoders illuminates microbe-host interactions
.
MSystems
2016
;
1
:
e00025
15
.

72.

Zhou
J
,
Troyanskaya
OG
.
Predicting effects of noncoding variants with deep learning–based sequence model
.
Nat Methods
2015
;
12
:
931
.

73.

Alipanahi
B
,
Delong
A
,
Weirauch
MT
, et al.
Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning
.
Nat Biotechnol
2015
;
33
:
831
8
.

74.

Kelley
DR
,
Snoek
J
,
Rinn
JL
.
Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks
.
Genome Res
2016
;
26
:
990
9
.

75.

Zeng
T
,
Li
R
,
Mukkamala
R
, et al.
Deep convolutional neural networks for annotating gene expression patterns in the mouse brain
.
BMC Bioinformatics
2015
;
16
:
147
.

76.

Singh
R
,
Lanchantin
J
,
Robins
G
, et al.
DeepChrome: deep-learning for predicting gene expression from histone modifications
.
Bioinformatics
2016
;
32
:
i639
48
.

77.

Koh
PW
,
Pierson
E
,
Kundaje
A
.
Denoising genome-wide histone ChIP-seq with convolutional neural networks
.
Bioinformatics
2017
;
33
:
i225
33
.

78.

Poplin
R
,
Newburger
D
,
Dijamco
J
, et al.
Creating a universal SNP and small indel variant caller with deep neural networks
2016
;
biorxiv:092890
.

79.

Li
Y
,
Quang
D
,
Xie
X
.
Understanding sequence conservation with deep learning
, In:
Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
.
New York, USA
:
ACM
,
2017
, pp.
400
6
.

80.

Cuperus
JT
,
Groves
B
,
Kuchina
A
, et al.
Deep learning of the regulatory grammar of yeast 5' untranslated regions from 500,000 random sequences
.
Genome Res
2017
;
27
:
2015
24
.

81.

Raza
K
,
Alam
M
.
Recurrent neural network based hybrid model for reconstructing gene regulatory network
.
Comput Biol Chem
2016
;
64
:
322
34
.

82.

Danaee
P
,
Ghaeini
R
,
Hendrix
DA
.
A deep learning approach for cancer detection and relevant gene identification
. In:
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
.
New Jersey, USA
:
World Scientific
,
2016
, p.
219
.
NIH Public Access.

83.

Quang
D
,
Chen
Y
,
Xie
X
.
DANN: a deep learning approach for annotating the pathogenicity of genetic variants
.
Bioinformatics
2014
;
31
:
761
3
.

84.

Fakoor
R
,
Ladhak
F
,
Nazi
A
, et al.
Using deep learning to enhance cancer diagnosis and classification
. In:
Proceedings of the International Conference on Machine Learning
.
New York, USA
:
ACM
,
2013
.

85.

Tan
J
,
Ung
M
,
Cheng
C
, et al. Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders. In:
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
.
New Jersey, USA
:
World Scientific
,
2015
, p.
132
.
NIH Public Access.

86.

Khademi
M
,
Nedialkov
NS
.
Probabilistic graphical models and deep belief networks for prognosis of breast cancer
. In:
2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)
.
Piscataway, NJ, USA
,
2015
, pp.
727
32
.
IEEE
.

87.

Liang
M
,
Li
Z
,
Chen
T
, et al.
Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach
.
IEEE/ACM Trans Comput Biol Bioinform
.
Piscataway, NJ, USA
,
2015
;
12
:
928
37
.

88.

Young
JD
,
Cai
C
,
Lu
X
.
Unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma
.
BMC Bioinformatics
2017
;
18
:
381
.

89.

Liang
Z
,
Huang
JX
,
Zeng
X
, et al.
DL-ADR: a novel deep learning model for classifying genomic variants into adverse drug reactions
.
BMC Med Genomics
2016
;
9
:
48
.

90.

Tripathi
R
,
Patel
S
,
Kumari
V
, et al.
DeepLNC, a long non-coding RNA prediction tool using deep neural network
.
Netw Model Anal Health Inform Bioinform
2016
;
5
:
1
14
.

91.

Zhang
Y
,
Liu
X
,
MacLeod
JN
, et al.
DeepSplice: deep classification of novel splice junctions revealed by RNA-seq
. In:
2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
.
Piscataway, NJ, USA
,
2016
, pp.
330
3
.
IEEE
.

92.

Xu
Y
,
Wang
Y
,
Luo
J
, et al.
Deep learning of the splicing (epi)genetic code reveals a novel candidate mechanism linking histone modifications to ESC fate decision
.
Nucleic Acids Res
2017
;
45
:
12100
12
.

93.

Leung
MK
,
Xiong
HY
,
Lee
LJ
, et al.
Deep learning of the tissue-regulated splicing code
.
Bioinformatics
2014
;
30
:
i121
.

94.

Pan
X
,
Shen
H-B
.
RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach
.
BMC Bioinformatics
2017
;
18
:
136
.

95.

Y-z
Z
,
Yamaguchi
R
,
Imoto
S
, et al.
Sequence-specific bias correction for RNA-seq data using recurrent neural networks
.
BMC Genomics
2017
;
18
:
1044
.

96.

Lee
B
,
Baek
J
,
Park
S
, et al.
DeepTarget: end-to-end learning framework for microRNA target prediction using deep recurrent neural networks
. In:
Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
.
New York, USA
,
2016
, pp.
434
42
.
ACM
.

97.

Park
S
,
Min
S
,
Choi
H
, et al.
DeepMiRGene: deep neural network based precursor microRNA prediction
.
2016
.
Preprint arXiv:1605.00017
.

98.

Yu
L
,
Sun
X
,
Tian
S
, et al.
Drug and nondrug classification based on deep learning with various feature selection strategies
.
Current Bioinform
2018
;
13
:
253
9
.

99.

Ching
T
,
Zhu
X
,
Garmire
L
.
Cox-nnet: an artificial neural network Cox regression for prognosis prediction
.
2016
bioRxiv:093021
.

100.

Chaudhary
K
,
Poirion
OB
,
Lu
L
, et al.
Deep learning based multi-omics integration robustly predicts survival in liver cancer
.
Clin Cancer Res
2017
;
24
:
clincanres.0853
.

101.

Bhat
RR
,
Viswanath
V
,
Li
X
.
DeepCancer: detecting cancer through gene expressions via deep generative learning
.
2016
.
Preprint arXiv:1612.03211
.

102.

Hochreiter
S
,
Heusel
M
,
Obermayer
K
.
Fast model-based protein homology detection without alignment
.
Bioinformatics
2007
;
23
:
1728
36
.

103.

Qi
Y
,
Oja
M
,
Weston
J
, et al.
A unified multitask architecture for predicting local protein properties
.
PloS One
2012
;
7
:
e32235
.

104.

Uziela
K
,
Menéndez Hurtado
D
,
Shu
N
, et al.
ProQ3D: improved model quality assessments using deep learning
.
Bioinformatics
2017
;
33
:
1578
80
.

105.

Di Lena
P
,
Nagata
K
,
Baldi
P
.
Deep architectures for protein contact map prediction
.
Bioinformatics
2012
;
28
:
2449
57
.

106.

Stahl
K
,
Schneider
M
,
Brock
O
.
EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction
.
BMC Bioinformatics
2017
;
18
:
303
.

107.

Nguyen
SP
,
Shang
Y
,
Xu
D
.
DL-PRO: A novel deep learning method for protein model quality assessment
. In:
2014 International Joint Conference on Neural Networks (IJCNN)
.
Piscataway, NJ, USA
,
2014
, pp.
2071
8
.
IEEE
.

108.

Lyons
J
,
Dehzangi
A
,
Heffernan
R
, et al.
Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto-encoder deep neural network
.
J Comput Chem
2014
;
35
:
2040
6
.

109.

Spencer
M
,
Eickholt
J
,
Cheng
J
.
A deep learning network approach to ab initio protein secondary structure prediction
.
IEEE/ACM Trans Comput Biol Bioinform
2015
;
12
:
103
12
.

110.

Jo
T
,
Hou
J
,
Eickholt
J
, et al.
Improving protein fold recognition by deep learning networks
.
Sci Rep
2015
;
5
:
srep17573
.

111.

Eickholt
J
,
Cheng
J
.
DNdisorder: predicting protein disorder using boosting and deep networks
.
BMC Bioinformatics
2013
;
14
:
88
.

112.

Li
Z
,
Yu
Y
.
Protein secondary structure prediction using cascaded convolutional and recurrent neural networks
.
2016
.
Preprint arXiv:1604.07176
.

113.

Wang
S
,
Sun
S
,
Xu
J
.
Analysis of deep learning methods for blind protein contact prediction in CASP12
.
Proteins
2017
;
82
:
208
11
.

114.

Adhikari
B
,
Hou
J
,
Cheng
J
.
DNCON2: Improved protein contact prediction using two-level deep convolutional neural networks
.
Bioinformatics
2017
;
34
(9)
:
1466
72
.

115.

Li
H
,
Hou
J
,
Adhikari
B
, et al.
Deep learning methods for protein torsion angle prediction
.
BMC Bioinformatics
2017
;
18
:
417
.

116.

Kulmanov
M
,
Khan
MA
,
Hoehndorf
R
.
DeepGO: Predicting protein functions from sequence and interactions using a deep ontology-aware classifier
.
Bioinformatics
2017
;
34
(4)
:
660
68
.

117.

Wang
D
,
Zeng
S
,
Xu
C
, et al.
MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction
.
Bioinformatics
2017
;
33
(24)
:
3909
16
.

118.

Jiménez
J
,
Doerr
S
,
Martínezrosell
G
, et al.
DeepSite: protein binding site predictor using 3D-convolutional neural networks
.
Bioinformatics
2017
;
33
(19)
:
3036
42
.

119.

Wei
L
,
Ding
Y
,
Su
R
, et al.
Prediction of human protein subcellular localization using deep learning
.
J Parallel Distrib Comput
2017
.

120.

Sønderby
SK
,
Sønderby
CK
,
Nielsen
H
, et al.
Convolutional LSTM networks for subcellular localization of proteins
. In:
International Conference on Algorithms for Computational Biology
.
New York, USA
:
Springer
2015
, pp.
68
80
.

121.

Almagro
JA
,
Sønderby
CK
,
Sønderby
SK
, et al.
DeepLoc: prediction of protein subcellular localization using deep learning
.
Bioinformatics
2017
;
33
(21)
:
3387
95
.

122.

Wan
F
,
Zeng
J
.
Deep learning with feature embedding for compound-protein interaction prediction
.
bioRxiv:2016.086033
.

123.

Zhao
Z
,
Gong
X
.
Protein-protein interaction interface residue pair prediction based on deep learning architecture
.
IEEE/ACM Trans Comput Biol Bioinform
2017
;
PP
(99)
:1-1
.

124.

Verborgh
R
,
Wilde
MD
.
Using OpenRefine
.
Bermingham, England
:
Packt Publishing
,
2013
.

125.

Altschul
SF
,
Gish
W
,
Miller
W
, et al.
Basic local alignment search tool
.
J Mol Biol
1990
;
215
:
403
10
.

126.

Shen
J
,
Zhang
J
,
Luo
X
, et al.
Predicting protein-protein interactions based only on sequences information
.
Proc Natl Acad Sci U S A
2007
;
104
:
4337
41
.

127.

Zhao
Y
.
Predicting protein-protein interactions from protein sequences using probabilistic neural network and feature combination
.
J Inform Comput Sci
2014
;
11
:
2397
406
.

128.

Atchley
WR
,
Zhao
J
,
Fernandes
AD
, et al.
Solving the protein sequence metric problem
.
Proc Natl Acad Sci USA
2005
;
102
:
6395
.

129.

Eickholt
J
,
Cheng
J
.
DNdisorder: predicting protein disorder using boosting and deep networks
.
BMC Bioinformatics
2013
;
14
:
1
10
.

130.

He
K
,
Zhang
X
,
Ren
S
, et al.
Deep residual learning for image recognition
. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
.
2016
, pp.
770
8
.

131.

Hinton
GE
,
Srivastava
N
,
Krizhevsky
A
, et al.
Improving neural networks by preventing co-adaptation of feature detectors
.
Comput Sci
2012
;
3
:
212
23
.

132.

Snoek
J
,
Larochelle
H
,
Adams
RP
. Practical bayesian optimization of machine learning algorithms. In:
Advances in Neural Information Processing Systems
,
2012
,
2951
9
.

133.

Bahrampour
S
,
Ramakrishnan
N
,
Schott
L
,
Shah
M
Comparative study of caffe, neon, theano, and torch for deep learning
. In:
Proceedings of the 2016 International Conference on Learning Representations
,
San Juan, PR, USA
,
2015
, pp.
1
11
.

134.

Shi
S
,
Wang
Q
,
Xu
P
, et al.
Benchmarking state-of-the-art deep learning software tools
.
2016
.
Preprint arXiv:1608.07249
.

135.

Palatucci
M
,
Pomerleau
D
,
Hinton
G
, et al.
Zero-shot learning with semantic output codes
. In:
International Conference on Neural Information Processing Systems
.
2009
, pp.
1410
8
.

136.

Fei-Fei
L
,
Fergus
R
,
Perona
P
.
One-shot learning of object categories
.
IEEE Trans Pattern Anal Mach Intell
2006
;
28
:
594
611
.

137.

Goodfellow
IJ
,
Pouget-Abadie
J
,
Mirza
M
, et al.
Generative adversarial networks
.
Adv Neural Inf Process Syst
2014
;
3
:
2672
80
.

138.

Kukar
MZ
,
Kononenko
I
.
Cost-sensitive learning with neural networks
.
The 13th European Conference on Artificial Intelligence (Brighton, UK)
.
Hoboken, NJ, USA
:
IOS Press
,
1998
, pp.
445
9
.

139.

Lanchantin
J
,
Singh
R
,
Wang
B
, et al.
Deep motif dashboard: visualizing and understanding genomic sequences using deep neural networks
.
Pac Symp Biocomput
2016
;
22
:
254
.

140.

Sutton
RS
,
Barto
AG
.
Reinforcement learning: an introduction, bradford book
.
IEEE Trans Neural Netw
,
International Conference on Robotics and Automation 2015 (Washington, USA)
,
Piscataway, NJ, USA
:
IEEE
,
2005
;
16
:
285
6
.

141.

Polikar
R
,
Upda
L
,
Upda
SS
, et al.
Learn++: an incremental learning algorithm for supervised neural networks
.
IEE Trans Syst Man Cybern C Appl Rev
2001
;
31
:
497
508
.

142.

Pan
SJ
,
Yang
Q
.
A survey on transfer learning
.
IEEE Trans Knowl Data Eng
2010
;
22
:
1345
59
.

143.

Cutler
M
,
How
JP
.
Efficient reinforcement learning for robots using informative simulated priors
.
2015
, pp.
2605
12
.

Author notes

These authors contributed equally to this work.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data