MAGIA2 (http://gencomp.bio.unipd.it/magia2) is an update, extension and evolution of the MAGIA web tool. It is dedicated to the integrated analysis of in silico target prediction, microRNA (miRNA) and gene expression data for the reconstruction of post-transcriptional regulatory networks. miRNAs are fundamental post-transcriptional regulators of several key biological and pathological processes. As miRNAs act prevalently through target degradation, their expression profiles are expected to be inversely correlated to those of the target genes. Low specificity of target prediction algorithms makes integration approaches an interesting solution for target prediction refinement. MAGIA2 performs this integrative approach supporting different association measures, multiple organisms and almost all target predictions algorithms. Nevertheless, miRNAs activity should be viewed as part of a more complex scenario where regulatory elements and their interactors generate a highly connected network and where gene expression profiles are the result of different levels of regulation. The updated MAGIA2 tries to dissect this complexity by reconstructing mixed regulatory circuits involving either miRNA or transcription factor (TF) as regulators. Two types of circuits are identified: (i) a TF that regulates both a miRNA and its target and (ii) a miRNA that regulates both a TF and its target.
MicroRNAs (miRNAs) are fundamental post-transcriptional regulators of several biological processes, whose alterations have been demonstrated in a number of diseases and in almost all human cancers (1–5).
miRNA imperfectly bind the 3′-untranslated region (3′-UTR) of their target mRNAs and may cause translation inhibition and/or mRNA cleavage and degradation (6,7). miRNAs can have multiple targets and a single protein-coding gene can be targeted by multiple miRNAs. In this perspective, the network of post-transcriptional regulatory relationships is expected to have a highly complex connectivity structure.
In silico target identification is based on (i) sequence similarity search, possibly considering target site evolutionary conservation and (ii) thermodynamic stability. However, it is known that the results of target prediction algorithms are characterized by very low specificity (8). This is caused both by the limited comprehension of the molecular basis of miRNA-target pairing and by the context-dependency of post-transcriptional regulation due to the cooperative interactions of different miRNAs. The integration of target predictions with miRNA and gene expression profiles has been recently proposed to improve the detection of functional miRNA-target relationships (9,10). As miRNAs act prevalently through target degradation, expression profiles of miRNA and target genes/transcripts are expected to be inversely correlated.
Although some studies have demonstrated that the regulation of single key targets may largely explain the function of a given miRNA in tumorigenesis or metastasis development (11–13), the mechanisms of RNA regulation are emerging to be much more complex than previously expected. miRNA activity should be viewed as part of a complex system where regulatory elements and their interactors generate a highly connected network, which regulates gene expression on multiple levels. Unfortunately, from an experimental point of view, the cooperation of different miRNAs acting on more than one target and the interplay with other regulatory levels are still challenging to be proved.
The gene regulation at the transcriptional level is, in general, quantitatively stronger than that occurring post-transcriptionally. Both levels impact on mRNA/genes expression profiles. Indeed, the miRNAs expression level can be activated or repressed by transcription factors (TFs), whereas mRNAs encoding TFs can be silenced by miRNAs. miRNAs and TF can share targets. Thus, miRNAs and TFs can form feedback or feed-forward loops, cooperating to tune gene expression (14) and playing critical roles in various biological processes. In a seminal review, Inui et al. (15) describes the role of miRNAs in pathways as decision maker elements to discriminate real versus too weak or too transient signals. In this context, the association between miRNAs and target expression profiles is generally more complex than linear correlation.
Although many researchers have attempted to understand the mechanism of miRNAs to regulate target genes and their roles in various diseases, the study of miRNA regulation by TFs (TF–miRNA regulation) has been relatively limited (16). Some attempts have been reported in order to computationally reconstruct regulatory circuits concerning miRNA–TFs and their joint targets (17–19). However, the identification of miRNA promoters is a challenging task. As promoter regions could be several thousands of base pairs upstream of the primary-miRNA, a correct way of identifying putative miRNA regulatory regions was not clearly established so far. To this aim, a few databases have been developed to store experimentally validated TF-miRNAs interactions (16,20).
Although several web tools (21–27) were recently developed in order to enhance the functional insights of miRNA functions and to improve miRNA–mRNA interactions identification almost none of them face the complex problem of the reconstruction of the above-mentioned TF–miRNA–target circuits.
Here, we present MAGIA2 (miRNA and genes integrated analysis 2012 update, freely available at http://gencomp.bio.unipd.it/magia2) an updated and fully redesigned release of MAGIA (21), a web application for the integrative analysis of in silico target prediction, miRNA and gene expression data. As in the previous release, MAGIA2 refines in silico target prediction through miRNA–target expression association measures. MAGIA2 extends the analysis, supporting multiple organisms (human, mouse, rat and drosophila), and using a greatly expanded list of target predictions algorithms. It allows the combination of multiple predictions and the selection of user-defined thresholds for each one. The MAGIA2 architecture was completely redesigned: both data and analyses results are stored in a user-specific environment keeping the data private. Furthermore, the updated MAGIA2 tries to dissect regulatory complexity reconstructing mixed regulatory circuits involving either human miRNA or TF as regulators (Figure 1A). In particular, two types of mixed regulatory circuits are identified: (i) a TF that regulates both a given miRNA and its target gene; (ii) a miRNA that regulates both a given TF and its regulated gene (Figure 1B). In order to help the user in the interpretation of the results, novel features are implemented into MAGIA2. Among them (i) miRNA–target interactions experimentally validated [as reported in miRecords (28), TarBase (29) and mirTarBase (30) databases] are specifically marked; (ii) Functional enrichment of the gene network component can be performed directly from MAGIA2 using DAVID platform.
WEB TOOL IMPLEMENTATION
In Figure 2, MAGIA2 architecture comprises three main sections: (i) data upload, (ii) data analysis and methods setup and (iii) results visualization, browsing and linking to external knowledgebase and tools.
In the first section, the user selects the organism, defines the gene or transcript ID used (EntrezGene, RefSeq, ENSEMBL gene or transcript are allowed) and uploads miRNA and gene (or transcript) expression data. In this way, a private MAGIA environment is created. Input files must be tab-delimited matrices (samples on the columns and, genes/transcripts or miRNAs on the rows).
Gene and miRNA expression profiles should be already normalized and filtered. Specifically, mRNA and miRNA profiles could be generated using different platforms and could be pre-processed with different algorithms as correlation coefficient and mutual information are invariant under a linear transformation.
Platform-specific probes (Affymetrix and Agilent IDs) are not unique, with several probes mapping on the same gene/transcript and being frequently associated with not concordant expression. Besides, over 30% of the Affymetrix probes do not correctly map to the genes they are supposed to belong (31,32). For these reasons, in the expression signal reconstruction phase, we suggest the user (i) to use custom CDF for Affymetrix technology and (ii) to choose the best probe (in term of quality check flag, or using other criteria) for Agilent technology.
The first columns of matrices should contain miRNA and gene/transcripts IDs of supported organisms; the header lines should be set according to the experimental design chosen. MAGIA2 considers two different experimental designs: (i) matched case—gene/transcript and miRNA expression data from the same biological samples and (ii) non-matched case—mRNA/transcript and miRNA expression data on different biological samples, belonging to the same set of classes, possibly resulting in different sample sizes. When using matched profiles the names of the columns of miRNA and gene/transcript matrices should correspond exactly; otherwise, MAGIA2 input files requires that column labels represent sample classes. When matrices have been successfully uploaded, the user has now created his personal environment.
No registration is required to use MAGIA2. To map a user to his personal environment, two standard mechanisms for session management are used: (i) a cookie is set in the user browser; (ii) a session id is appended to each web application url.
The second mechanism allows users to share their personal environment (data and analyses) with collaborators. Moreover, the design of MAGIA2 allows two or more users to perform analyses at the same time on a shared environment without explicit notification or locking.
After data upload, the user can setup a new analysis using data available in his environment and specify settings in a few simple steps.
Step 1: selection of expression matrices
After creating the ‘My data’ environment, the user can select two matrices, one for gene/transcripts and one for miRNA expression data. As the presence of miRNAs and genes with poorly variable expression profiles may introduce noise and produce false-positive associations, by default MAGIA2 filters profiles according to their expression variability, calculated through the variation coefficient (CV): the 25% less variable profiles (both for miRNAs and for genes/transcripts) are removed. This variability filter can be explicitly skipped using a check box in case the data selected by the user are already filtered. This may apply to a user pre-selected subset of differentially expressed miRNAs that need to be all considered for network reconstruction.
Step 2: selection of the association measure
The selection of the association measures is strictly dependent on the experimental design chosen by the user. For matched design, three methods for expression profiles combination are available (parametric, non-parametric linear correlation and association measure based on information theory), while for un-matched design only a meta-analysis is possible. The Spearman correlation is a non-parametric, rank-based linear correlation measure suitable for non-normally distributed data and/or small sample size (e.g. 3–5). Pearson correlation is a parametric linear correlation measure, suggested for normally distributed data and medium-large sample size (>5). Mutual information is an information measure quantifying the mutual dependence of variables, including non-linear relationships and in general is suitable for large sample size (at least 20 needed).
Meta analysis approach is based on the combination of P-values of differential expression (using empirical Bayes test) (33), separately for genes and miRNAs across sample classes, using the inverse chi square distribution to identify oppositely variable miRNA–target pairs (21).
Step 3: selection of the target prediction algorithms
MAGIA2 makes use of eight different catalogs of target predictions: Microcosm (34), microrna.org (35), DIANA-microT (36), miRDB (37), PicTar (38), PITA (39), RNA22 (40) and TargetScan (41). Unfortunately not all of them use the same identifiers to refer to genes and transcripts. MAGIA2 relieves the user from directly addressing this issue by converting identifiers at the start of each analysis. In case of a conversion requiring multiple steps (e.g. if the user upload matrices with Ensemble gene Ids, their conversion to Refseq would require two steps: Ensemble gene ID—Ensemble transcript ID and Ensemble transcript ID—RefSeq), MAGIA2 warns the user. ID conversion, from one transcript ID to another, in fact usually leads to the loss of a noticeable amount of identifiers and potentially incorrect assignments because there is not a one to one relationship between transcripts identifiers of major sequence databases. Predictions may be further filtered using their scores (MAGIA2 tutorial provides information to guide the user in this selection procedure) and multiple targets database may be combined, by taking their intersection.
Integrated analysis: interaction strength quantification and regulatory circuits identification human and mouse data
Using human and mouse data, MAGIA2 combines expression profiles analysis with three sets of regulatory interaction predictions, i.e. information about TFs and miRNAs that may regulate a gene/transcript expression at transcriptional and post transcriptional level, respectively, and about transcriptional regulation of TF on miRNAs.
In this way, MAGIA2 identifies two types of mixed regulatory circuits: (i) a TF regulating both a given miRNA and its target gene; and (ii) a miRNA regulating both a given TF and its regulated gene.
More in detail, miRNA–target interactions derive from the combination of Steps 2 and 3 above. Namely, only the subset of predicted miRNA–target interactions associated to informative expression profiles relationship measures (e.g. high correlation) are included in the network.
Regarding TFs, we used experimentally validated TF–miRNA interactions reported in mirGen2.0 (17) and TransmiR (42), whereas TF–gene interactions were obtained from ECRbase database (43) (http://ecrbase.dcode.org/) for mouse organism and from the ‘TFBS conserved’ track of the UCSC genome annotation for human (version hg19). Each annotated TF binding site was associated with a RefSeq transcript if it lied within 10Kb of its 5′-UTR and 3 Kb of its 3′-UTR. To be more stringent, we required a prediction z-score of at least 3. This resulted in 304,035 total associations: each transcript is linked on average to 11.7 TFBS. By comparison, TargetScan predicts that on average 12.5 microRNAs interact with each transcript.
Thus, association measures described in Step 2 are estimated not only for miRNA–target interactions predicted in Step 3 but also for TF–target relations (TF-gene and TF–miRNA) retrieved from public databases.
Each circuit type might be the result of several different combinations of induction/repression interactions. Using correlation, significant circuits can include different combinations of positive and negative values, highlighting different types of putative regulatory interactions (e.g. TF that inhibits its target mRNA but that activates miRNA expression, a TF that activates both elements). MAGIA2 allows the user to download all the circuits in a flat file that can be easily open with a excel sheet for further processing, filtering and analysis and highlight edges according to the sign of the association measure. With this file, the user can check the correlation associated to each edge and speculate on their functional relations.
Rat and Drosophila data
Using rat and drosophila data association measures described in Step 2 are estimated only for those miRNA–target interactions predicted in Step 3. TF-target and TF-miRNA information is unavailable.
It is worth noticing that when MAGIA2 is used for analysis of miRNA and transcript expression profiles, each alternative transcript of the same gene is treated as a separate entity. In this case, network and circuits visualization and interaction tables report information on individual transcript (and of the gene to which it belongs). The use of transcripts expression increase network complexity but also adds relevant information on transcript heterogeneity, especially when transcript expression is adequately quantified.
MAGIA2 results are reported as interactive images and dynamic tables (Figure 3). According to the analysis setting, MAGIA2 shows (i) the top 200 functional interactions (between miRNAs, genes and TFs); (ii) top 20 mixed regulatory circuits and (iii) top 40 interactors of a user-selected network node. For each network, two or more tables report different types of supported relationships. In the general view, miRNA–target and TF–target relations are shown, whereas in the circuits visualization mode, triplets of elements involved in type 1 and 2 circuits are listed separately. Focusing on a specific gene/transcript, TF or miRNAs, all direct interactors are shown in appropriate tables. Experimentally validated miRNA–target interactions according to the miRecords (28), mirTarbase (29) and mirTarBase (30) are highlighted in tables. Furthermore, for each element involved in a supported interaction, MAGIA2 provides links to external miRNA- and gene-related databases to help the user in results interpretation. Complete results can be browsed and download as tab delimited flat files.
From the general network view, a DAVID enrichment analysis can be automatically started. MAGIA2 transmits the whole set of genes and TFs represented in the network directly to the DAVID tool for functional enrichment analyses. The default enrichment score provided by DAVID is estimated on the whole genome; however, the user can upload the list of the platform IDs as background to perform a different analysis. This feature may allow the identification of molecular functions under the control of a specific set of transcriptional and post-transcriptional regulators.
CASE STUDY: THE RECONSTRUCTION OF REGULATORY CIRCUITS IN NCI-60 CELL LINES
As a benchmark case study, we used the miRNA and target expression profiles of the NCI-60, a panel of 60 human cancer cell lines from several distinct tissues (44). The NCI-60 expression data were collected using, respectively, the Affymetrix HG-U133A platform and the Ohio State University Comprehensive Cancer Center miRNA platform, OSUCCC, version 3.0. Both data sets are available at the ArrayExpress database (mRNA: E-GEOD-5720, miRNA: E-MEXP-1029).
We selected EntrezGene IDs, Pearson correlation measure and the intersection of TargetScan and DIANA-microT target prediction algorithms. We found a total number of 970 interactions with Pearson correlation r > 0.4 in absolute value and with FDR < 0.05 (Table 1). Among these, 13 miRNA–target interactions (two with positive and 11 with negative correlations) are experimentally validated. Given the dense connectivity of the entire network, MAGIA2 shows the sub-network derived from those interactions with r > 0.4 or r < −0.4, for a maximum of 200 interactions (50 miRNA–gene, 50 miRNA–TF, 50 TF–miRNA and 50 TF–gene). Figure 3 shows the resulting network for this case study (50 miRNA–gene, 1 miRNA–TF, 50 TF–gene and 31 TF–miRNA). Whole results can be downloaded as a tab delimited file with specific attributes that can be directly imported into Cytoscape (45).
|Interaction type||Pos. corr.||Neg. Corr.||TOT|
|miRNA–mRNA||216 (44%)||281 (56%)||497|
|TF–miRNA||14 (46%)||16 (54%)||30|
|TF–gene||120 (27%)||325 (73%)||444|
|TOT||350 (36%)||622 (64%)||970|
|Interaction type||Pos. corr.||Neg. Corr.||TOT|
|miRNA–mRNA||216 (44%)||281 (56%)||497|
|TF–miRNA||14 (46%)||16 (54%)||30|
|TF–gene||120 (27%)||325 (73%)||444|
|TOT||350 (36%)||622 (64%)||970|
miRNA–target interactions include a total number of 17 different miRNAs and of 39 different genes while TF–target interactions include a total number of 27 different TFs and 72 different TF targets. Among the best interactions, we found three miRNA–target relationships experimentally validated as reported in TarBase and miRecords: miR-223 with LMO2 (46) positively correlated with r = 0.79, miR-221 with ESR1 and miR-222 with ESR1 (47) inversely correlated with respectively r = −0.60 and r = −0.59. Interestingly, the interaction between miR-223 with LMO2 is the interaction with the highest score.
Among the top TF–target interactions, we found GATA1 and STAT5A factors. GATA factors are unique TFs with conserved DNA-binding domains. They serve diverse roles in embryogenesis, cell differentiation, regulation of tissue-specific genes and carcinogenesis. Multiple studies have analyzed the role of GATA factors in gastrointestinal malignancy, such as those of the stomach, pancreas and colon, and premalignant lesions such as Barrett’s esophagus (48). TF STAT5A, on the other hand, has been shown to play a critical role in prostate and breast cancer growth and differentiation (49).
Pathways enrichment analysis, conducted on target and TF genes, leads to relevant and interesting results: TGF-beta signaling pathway (P = 0.047, PANTHER database) (50) and ErbB signaling pathway (P = 0.032, KEGG database) (51) are identified as two of the most enriched pathways. Perturbations of transforming growth factor-β (TGFβ) signaling are central to tumorigenesis and tumor progression through their effects on cellular process, including cell proliferation and cell invasion (52) while excessive ErbB signaling is associated with the development of a wide variety of types of solid tumor. ErbB-1 and ErbB-2 are found in many human cancers and their excessive signaling may be critical factors in the development and malignancy of these tumors (53).
The most interestingly new feature of MAGIA2 is the possibility to identify putative mixed regulatory circuits. In this analysis (Figure 3B), MAGIA2 highlights 20 circuits possibly relevant in tumor samples.
Circuits of type 2, where miRNA acts as master regulator of a TF and its target involve principally FOXF2 and miR-200, miR-506 and miR-204 families. Changes in miR-200 family levels have been associated with enhanced tumorigenesis and resistance to several chemotherapy drugs; in addition, they significantly correlate with decreased survival (54–56).
Among circuits of type 1, where a TF regulates either the miR and its target, we observed an interesting network composed of four circuits (triangles) involving GATA1, as master regulator, miR-144 and miR-181 family, and specific common target genes (EPB41, ANK1 and HIC1). Taken individually, different elements in this net are known to be involved in cancer development and progression. For instance, recent studies showed that miR-181a and miR-181b function as tumor suppressors, which trigger growth inhibition, induce apoptosis and inhibit invasion in glioma cells (57). EPB41 is known as tumor suppressor gene in meningioma pathogenesis (58). The master regulator of hematopoiesis, GATA-1 binds to DNA sites with the consensus sequence [AT]GATA[AG] within regulatory regions of globin genes and of other genes expressed in erythroid cells. It plays also important roles in hematologic malignancies (59) and it seems to directly activate transcription of genes encoding the essential autophagy component (60).
Further, and most interestingly, combined involvements in cancer (as indirect or direct interactions) of miRNAs and specific genes and TFs of this net have been reported.
Specifically, a study of prognostic significance of CEBPA mutations in cytogenetically normal acute myeloid leukemia with high-risk molecular features, and gene and microRNA expression signatures associated with CEBPA mutations (61) showed that up-regulation of specific genes, including GATA1 (ZFPM1, EPOR, and GFI1B) and miRNAs (i.e. the miR-181 family) are involved in erythroid differentiation and down-regulation of homeobox genes. Similarly Whitman et al. (62) studied FLT3-internal tandem duplications as adverse prognostic marker in primary cytogenetically normal acute myeloid leukemia and reported an FLT3-ITD-associated expression signature with over-expression of different genes, including ANK1 [FLT3, homeobox genes (MEIS1, PBX3, HOXB3) and immunotherapeutic targets (WT1, CD33) and underexpression of leukemia-associated (MLLT3, TAL1) and erythropoiesis-associated (GATA3, EPOR, ANK1, HEMGN) genes] and underexpression of miR-144.
Focusing more on causal relationships, it was demonstrated that the miR 144/451 locus, essential for erythropoiesis, is a direct transcriptional target of GATA1 (63). Chromatin immunoprecipitation showed a strong signal for GATA-1 occupancy at the erythroid cis-regulatory module (enhancer) located 2.8 upstream of the miR 144/451 transcriptional start site, containing two conserved GATA binding motifs. During erythroid lineage differentiation, GATA-2 binding, characteristic of early precursors, is gradually replaced by GATA-1 that induces gene activation in later stages of maturation.
In summary, we can affirm first that key cancer genes, TFs and miRNAs are represented in top circuits blindly identified by MAGIA2 mixed circuits reconstruction based on cancer cell lines expression data. Furthermore, both combined involvement in cancer signatures and/or the existence of direct regulatory interactions relevant for cell differentiation regarding different pairs among the set of TFs, miRNAs and target genes found in MAGIA2 mixed circuits clearly corroborates the biological significance of mixed regulatory circuits identified by MAGIA2.
Funding for open access charge: University of Padova [CPDA119031 to C.R.] to S.B. and to A.C.; Associazione Italiana per la Ricerca sul Cancro (AIRC, Milano) ‘Special Program Molecular Clinical Oncology 5x1000’ to AGIMM (AIRC-Gruppo Italiano Malattie Mieloproliferative).
Conflict of interest statement. None declared.
We acknowledge the CINECA Award N. HP10BDJ9X8, 2010 for the availability of high performance computing resources and support.