RiceNet v2: an improved network prioritization server for rice genes

Rice is the most important staple food crop and a model grass for studies of bioenergy crops. We previously published a genome-scale functional network server called RiceNet, constructed by integrating diverse genomics data and demonstrated the use of the network in genetic dissection of rice biotic stress responses and its usefulness for other grass species. Since the initial construction of the network, there has been a significant increase in the amount of publicly available rice genomics data. Here, we present an updated network prioritization server for Oryza sativa ssp. japonica, RiceNet v2 (http://www.inetbio.org/ricenet), which provides a network of 25 765 genes (70.1% of the coding genome) and 1 775 000 co-functional links. Ricenet v2 also provides two complementary methods for network prioritization based on: (i) network direct neighborhood and (ii) context-associated hubs. RiceNet v2 can use genes of the related subspecies O. sativa ssp. indica and the reference plant Arabidopsis for versatility in generating hypotheses. We demonstrate that RiceNet v2 effectively identifies candidate genes involved in rice root/shoot development and defense responses, demonstrating its usefulness for the grass research community.


Benchmarking and integrating inferred functional links
Functional associations between genes from experimental, computational data were inferred by calculating the likelihood ratio (Log likelihood score, LLS) based on Bayesian statistics framework. LLS was calculated with the following equation; LLS ln | / | / where P(L|D)/P(¬L|D) is the odds of gold standard positives (P(L|D)) and negatives (P(¬L|D)) for a given data. P(L) /P(¬L) is the odds of all gold standard positives (P(L)) and negatives (P(¬L)). A network functional link can be supported by many multiple data types with different LLSs. Since not all of the data for integration are fully independent, naïve Bayesian integration is not a plausible approach. Hence, we used the weighted sum (WS) formula to integrate the data by modifying naïve Bayesian (6). The WS is defined as where S is the LLS. S 0 is the best LLSs and S i is LLS of ith rank. D is a free parameter that is used to give weight. T is the minimum LLS threshold. Weighted sum takes full score of the top LLS and partial scores of the rest of the LLS by weight factor to alleviate the addition of redundant information.

Inferring links from genomic context: Phylogenetic profile similarity (PG) and Gene neighborhood (GN)
Similar evolutionary conservation pattern between two genes across species are sometimes due to their functional relatedness. This genomic context similarity enables us to infer co-functional links between genes. For constructing RiceNet v2, we used the two most widely used genomic context based network link inferring methods, Phylogenetic profile similarity (PG) (7-9) and gene neighborhood (GN) (10)(11)(12). A total of 2,144 sequenced genomes were used. (122 Archae, 1,626 Bacteria and 396 Eukarya genomes) Phylogenetic profile similarity of two rice genes reflects their co-inheritance during speciation. Co-functionality of genes can be inferred from co-inheritance because genes that function together tend to be inherited together. To measure probability of co-inheritance of two genes, we first ran BLASTp for all O. sativa genes against the 2,144 genomes. With the best BLASTp scores for each of genomes, 36,736 (number of O. sativa genes) by 2,144 (number of genomes) phylogenetic profile matrix was constructed. The association between two genes based on phylogenetic profiles was measured by mutual information (MI) scores as described in Date et al. (13). We did not use the whole concatenated profile of the 2,144 genomes. Rather, sub-profiles for each of three domains of life (Archaea, Bacteria, Eukarya) were separately used which resulted in constructing three networks. These were subsequently integrated to construct a single network. We found that there was substantial increase in the network coverage and accuracy by using this divide-and-integrate approach based on domain-specific phylogenetic profiles.
Two distinct measures of genomic neighborhood exist: i) direct physical distance between neighboring genes (11,12,14), and ii) neighborhood probability (10). There have been evidences that these two measures are complementary (15). We reasoned that if the two methods give complementary information, both of the measurements can be useful. Thus, we inferred cofunctional links with both measures. They were subsequently integrated to generate a single GN co-functional network.

Inferring links from literature curated (LC) protein-protein interactions (PPI)
Observing protein-protein interactions (PPIs) in the cell is one of the most popular and certain way to discover the functional associations between genes. To infer the PPI interaction based functional associations for rice, we mined three PPI databases: DIP(16), IntAct (17), MINT (18).

Inferring links from co-expression (CX) patterns
Genes with similar biological functions tend to co-express in diverse biological contexts. High dimensional microarray and RNA-seq data can be used to infer co-functional links between coexpressed genes. We analyzed expression data sets based on four array platforms in GEO (Gene Expression Omnibus) database (19): GPL2025, GPL13160, GPL6864 and GPL8852. To infer co-functional linkages by co-expression patterns, we first created a vector for each gene that contains expression profiles across microarray experiments (GEO samples) in each GEO series. Then we calculated all pairwise Pearson correlation coefficients between vectors to address for co-expression patterns. GEO series with less than 12 samples were not used because measuring correlation with short vectors can generate many promiscuous co-expression patterns between genes. Each GEO series (see Supplementary table 1) generated a single co-functional network.
Benchmarking with the gold standard set resulted in 39 co-functional networks. They were further integrated into a single CX network for rice.

Links transferred from other species' networks by orthology (Associalogs)
Many biological functions of genes are evolutionarily conserved across species by orthology. This allows transferring the functional information of genes from one species to another. We transferred co-functional linkages from networks of other organisms to RiceNet v2 using the associalog method (20). The links were transferred from three organisms with published genome scale functional gene networks: YeastNet v3 (21)