- Split View
-
Views
-
Cite
Cite
Diego A A Morais, Rita M C Almeida, Rodrigo J S Dalmolin, Transcriptogramer: an R/Bioconductor package for transcriptional analysis based on protein–protein interaction, Bioinformatics, Volume 35, Issue 16, August 2019, Pages 2875–2876, https://doi.org/10.1093/bioinformatics/btz007
- Share Icon Share
Abstract
Several freely available tools perform analysis using algorithms developed to identify significant variation of gene expression individually. The transcriptogramer R package uses protein–protein interaction to perform differential expression of functionally associated genes. The software assesses expression profile of entire genetic systems and reveals which biological systems are significantly altered in case-control designed transcriptome experiments.
R/Bioconductor transcriptogramer package projects expression values on an ordered gene list to perform topological analysis, differential expression and gene ontology enrichment analysis, independently of data platform or operating system.
Supplementary data are available at Bioinformatics online.
1 Introduction
The available tools to assess gene expression, such as limma (Ritchie et al., 2015) and DESeq2 (Love et al., 2014), were designed to evaluate expression of individual genes. However, gene products are known to interact with each other to collectively perform biological functions and metabolic pathways (Li et al., 2018). The transcriptogram (Rybarczyk-Filho et al., 2011), a systems biology-based method to analyze transcriptomes, uses protein–protein interaction (PPI) to build an ordered gene list. Briefly, this method clusters genes by the probability of their products interact with each other, ordering them in one dimension. In other words, interacting genes are expected to get closer to each other on the ordered gene list. The ordered gene list is then used to take the average expression of functionally associated genes in a given window with settable radius. This strategy reduces gene expression data noise and improves data measure reproducibility (da Silva et al., 2014). This Applications Note introduces a new R package for transcriptional analysis based on transcriptograms. The transcriptogramer R package is able to perform analysis on RNA-Seq and Microarray expression data by applying to transcriptograms the protocol adopted by limma package, known to perform well and fast under many circumstances (Conesa et al., 2016).
2 Implementation
Transcriptogramer R package performs topological analysis, differential expression (DE) and gene ontology (GO) enrichment analysis. The workflow initially requires an ordered gene list and an edge list with gene connections. The package contains ordered gene lists for four species (Homo sapiens, Mus musculus, Saccharomyces cerevisiae and Rattus norvegicus). Edge lists can be created from STRINGdb PPI. Using the initial inputs, it is possible to measure degree and clustering coefficient of individual genes as well as the same properties of functionally associated genes in a window segment of the ordered gene list. The software also calculates graph properties such as average assortativity and clustering coefficient as function of node connectivity. Transcriptograms are built in two steps that require expression data (Microarray or RNA-seq), a dictionary (mapping proteins to identifiers used as expression data rownames), and the initial inputs (the ordered gene list and the edge list). The first step assigns expression values (obtained from the expression data) to each respective gene in the ordered gene list. The second step assigns to each gene the average expression of neighbor genes in the ordered gene list. The logic is to define a sliding window centered on a given gene with a fixed radius (set by the user). The sliding window considers periodic boundary conditions to deal with genes near the ordered gene list borders. The goal here is to measure the average expression of functionally associated genes, represented by neighbor genes in the ordered gene list.
DE analysis evaluates expression variations of functionally associated genes in case-control experiments. As a result, the package provides a data.frame and plots a figure showing differentially expressed gene groups (DEGG) (Fig. 1a). PPI among inferred DEGG can be displayed as graphs (Fig. 1b) and it is also possible to perform GO enrichment analysis to identify the GO terms significantly associated with each DEGG (Fig. 1b). The complete transcriptogramer R package pipeline is shown in Supplementary Figure S1 and the main dependencies used by the package on the analysis are shown in Supplementary Table S1. The transcriptogramer package was implemented in R and is freely available at Bioconductor open source software for bioinformatics. The package is cross-platform, runs on Windows, Linux and Mac OS X. transcriptogramer is multi-thread compatible, allowing users to easily define the number of cores to be used on some methods, reducing runtime.
3 Application examples
The transcriptogramer R package was applied to Microarray and RNA-Seq datasets of human colorectal adenocarcinoma cells HT-29 treated with two different concentrations of 5-aza-deoxy-cytidine (NCBI-BioProject, PRJNA177602) (Xu et al., 2013). The RNA-Seq reads were processed following a count-based protocol (Anders et al., 2013) and log2-counts-per-million values were obtained using limma voom() function (the complete list of tools used in the protocol is shown in Supplementary Table S2 and the pipeline is shown in Supplementary Fig. S2). After processing, filtering, and normalization (a multidimensional scaling plot of the RNA-Seq samples is shown in Supplementary Fig. S3), data were analyzed using transcriptogramer package pipeline as well as another pipeline combining functions from limma and topGO (Alexa et al., 2006) packages. Samples (9 at total) were classified into three groups (control, 5-aza-deoxy-cytidine 5 and 10 µM). Two case-control DE and functional enrichment were performed on Microarray and RNA-Seq data for each pipeline. All methodologies were able to identify similar number of differentially expressed genes. However, transcriptogramer R package was able to identify nearly ten times more GO terms when comparing with classic DE analyses followed by topGO enrichment in both Microarray and RNA-Seq data. A detailed comparison can be found in Supplementary Tables S3–S8.
4 Discussion
The transcriptogramer provides a systems biology-based pipeline for RNA-Seq and Microarray data transcriptional analysis, designed to identify DE of functionally associated genes. transcriptogramer advantage over existing functional analysis pipelines involves using canonical PPI data to select non-predefined gene sets, which enable to identify gene clusters responsible for a given GO term enrichment. Functionally associated genes have inherent hierarchical properties, thus, users can increase the window radius to merge gene sets and explore the enrichment results of bigger systems. The transcriptogramer R/Bioconductor package is an easy to install well-documented tool, developed to help bioinformaticians and biologists to convert their gene expression data into biological insights.
Acknowledgements
The authors would like to thank the NPAD/UFRN.
Funding
This work was supported by the CNPq [grant 444856/2014–5] and the CAPES [grant 3381/2013].
Conflict of Interest: none declared.
References