-
PDF
- Split View
-
Views
-
Cite
Cite
Chao Hou, Haotai Xie, Yang Fu, Yao Ma, Tingting Li, MloDisDB: a manually curated database of the relations between membraneless organelles and diseases, Briefings in Bioinformatics, Volume 22, Issue 4, July 2021, bbaa271, https://doi.org/10.1093/bib/bbaa271
- Share Icon Share
Abstract
Cells are compartmentalized by numerous membrane-bounded organelles and membraneless organelles (MLOs) to ensure temporal and spatial regulation of various biological processes. A number of MLOs, such as nucleoli, nuclear speckles and stress granules, exist as liquid droplets within the cells and arise from the condensation of proteins and RNAs via liquid–liquid phase separation (LLPS). By concentrating certain proteins and RNAs, MLOs accelerate biochemical reactions and protect cells during stress, and dysfunction of MLOs is associated with various pathological processes. With the development in this field, more and more relations between the MLOs and diseases have been described; however, these results have not been made available in a centralized resource. Herein, we build MloDisDB, a database which aims to gather the relations between MLOs and diseases from dispersed literature. In addition, the relations between LLPS and diseases were included as well. Currently, MloDisDB contains 771 curated entries from 607 publications; each entry in MloDisDB contains detailed information about the MLO, the disease and the functional factor in the relation. Furthermore, an efficient and user-friendly interface for users to search, browse and download all entries was provided. MloDisDB is the first comprehensive database of the relations between MLOs and diseases so far, and the database is freely accessible at http://mlodis.phasep.pro/.
Introduction
Unlike membrane-bounded organelles, which are encapsulated by lipid bilayer membranes, membraneless organelles (MLOs) are not enclosed by membranes; however, they are morphologically and compositionally distinct from their surroundings and regulate various biological processes [1]. Recently, emerging evidence supports that MLOs are assembled via liquid–liquid phase separation (LLPS) [2–5], a process by which a single phase of macromolecules separates into distinct phases where the concentration of certain proteins and RNAs varies greatly [6–8]. The driving force of LLPS is the multivalent interaction among macromolecules, which is supported by linear repeats of modular interaction domains or intrinsically disordered regions (IDRs) in proteins [9–11].

LLPS enables MLOs to selectively enrich certain proteins and RNAs [7], enriched proteins and RNAs can accelerate biochemical reactions and protect cells during stress [12]. For example, by locally concentrating rRNA processing factors, nucleoli facilitate the transition of precursor rRNA transcript into individual rRNA subunits [2]. By recruiting numerous antiviral proteins including RIG-1, PKR, OAS and RNaseL, stress granules enhance the innate immune response and viral resistance during viral infection [13]. A large number of studies on the composition, function and regulation of MLOs have been reported recently, and some specialized databases have been built to facilitate the research in this field. PhaSepDB [14], LLPSDB [15], PhaSePro [16] and DrLLPS [17] are four LLPS-related protein databases based on experimental evidence or localization/association evidence, PhaSepDB and DrLLPS include MLO-localized proteins and PhaSePro contains disease-associated mutations of LLPS proteins that have been demonstrated to affect LLPS. RNA granule database [18] contains current literature evidence for genes or proteins associated with stress granules or p-bodies. MSGP [19] is a database of the protein components of the mammalian stress granules. NSort/DB [20] provides access to intranuclear or subnuclear compartment associations for the mouse nuclear proteome. NOPdb [21] is a database of nucleolar proteome. All these resources enhance our understanding of the composition of MLOs.
Dysfunction of MLOs or key components and regulators of MLOs may lead to detrimental consequences [7, 9, 12, 22, 23]. For example, BYSL depletion disrupts nucleoli assembly after mitosis, resulting in increased apoptosis and reduced tolerance of hepatocellular carcinoma cells to serum starvation [24]. Disrupted expression of ULK1 and ULK2 causes VCP dysregulation and defective stress granule disassembly, which contributes to inclusion body myopathy-like disease [25]. Despite notable progress in discovering the relations between MLOs and diseases, there are no centralized resource gathering-related studies from dispersed literature. Hence, we developed MloDisDB (http://mlodis.phasep.pro/), which aims to provide a comprehensive map of MLO-associated diseases. For that LLPS is underlying the assembly of MLOs, the relations between LLPS and diseases were included as well. All entries in the MloDisDB were manually curated from the published literature and annotated with detailed information.
Materials and methods
Data collection and classification
The pipeline for the development of the MloDisDB is outlined in Figure 1. Related publications were obtained by searching NCBI PubMed using keywords including MLOs and diseases. For example, the search keywords for stress granule are ‘((stress granules [title/abstract]) or (stress granule [title/abstract])) and ((disease [title/abstract]) or (cancer [title/abstract]) or (neurodegeneration [title/abstract]))’. Twenty-nine MLOs were searched including Balbiani body, Cajal body, Chromatoid body, Cleavage body, DDX1 body, DNA damage foci, Gemini of Cajal body, Histone locus body, Insulator body, Mitochondrial RNA granule, Neuronal granule, Nuage, Nuclear pore complex, Nuclear speckle, Nuclear stress body, Nucleolus, OPT domain, P granule, Paraspeckle, P-body, PcG body, Pericentriolar matrix, Perinucleolar compartment, PML nuclear body, Postsynaptic density, Receptor cluster, Sam68 nuclear body, Stress granule and U body. LLPS and diseases were searched independently, and the complete search keywords list can be found in the database. By examining the full text of the relevant publications, the MLO–disease relations and LLPS–disease relations with comprehensive annotation were extracted.
The factors that function in the relations include protein, RNA and others (such as chemicals and artificially synthesized oligonucleotides, peptides). Some entries described that the perturbations of the MLOs were related to the diseases, and no factor was described in the original publication. ‘None’ was recorded for these entries. NCBI gene IDs were provided for RNA factors. NCBI gene IDs and UniProt IDs were provided for protein factors. Expression changes, mutations and post-translational modifications (PTMs) of the factors were extracted from original publications. Expression changes were recorded as upregulation or downregulation. Mutations were recorded according to the HGVS mutation nomenclature [26]. PTM information included positions, residues, modifications and the catalyzing enzymes.
MLO components, known LLPS proteins, LLPS-related predictions
To give a more comprehensive view of the MLOs in MloDisDB, the components for each MLO were listed in the MLO detail page, which were collected from five resources: UniProt [27], Gene Ontology [28], The Human Protein Atlas [29], COMPARTMENTS text mining channel [30] and Protein Universal Reference Publication-Originated Search Engine (PURPOSE) [31]. Known LLPS proteins were collected from the reviewed proteins of PhaSepDB [14] with in vitro experiment evidence, the reviewed scaffolds of DrLLPS [17], the proteins in PhaSePro [16] and the natural proteins of LLPSDB [15]; these proteins were listed in LLPS detail page. The components of MLOs and known LLPS proteins were mapped to UniProt, and only human reviewed proteins were included in the database.
To measure the LLPS property of the protein factors in MloDisDB, LLPS-related predictions were performed via CatGRANULE [32], PScore [33], PSPer [34], R + Y [35], PLAAC [36] and PAPA [37]. MobiDB-lite [38], which provides consensus IDR prediction, and DisoRDPbind [39], which predicts the RNA-, DNA- and protein-binding residues located in IDRs, were included as well. In MloDisDB, R + Y was calculated as (N_R*N_Y)/6500. N_R and N_Y represent the number of arginine and tyrosine residues in IDRs, and IDRs were defined as the regions whose MobiDB-lite scores ≥3/8.
Results
Database summary
MloDisDB is the first literature-based database of the relations between MLOs and diseases, and the relations between LLPS and diseases were included as well (Figure 1). In the current release, 3877 publications published before 1 April 2020 were obtained by searching MLOs and diseases as well as LLPS and diseases in NCBI PubMed. After manual check of the full text, 719 MLO–disease entries and 52 LLPS–disease entries were extracted from 607 publications. Detailed information including the organisms and cell lines used in experiments, the changes of MLOs, the functional factors, the changes of factors and descriptions of the relations were provided. As shown in Figure 2A, 29 MLOs were searched separately and 15 of them possessed relevant entries. Stress granule and nucleolus possess the most entries in MloDisDB, and they are the most widely studied MLOs in cytoplasm and nucleus, respectively. MLO–disease entries were further classified into 273 MLO-changed and 446 MLO-unchanged entries based on whether the MLOs changed or not in the relations. For MLO-changed entries, the size, number, assembly and dynamic changes of the MLOs were recorded. As shown in Figure 2B, the assembly and number of MLOs were abnormal in many disease conditions.

Statistical analysis of the entries in MloDisDB. (A) Distribution of the entries across MLOs and LLPS. (B) Distribution of the changes of MLOs. (C) Distribution of the entries across four disease categories, the nervous system diseases and cancers with more detailed entries were listed. (D) Distribution of the entries across functional factor types and the proteins with more detailed entries were listed. (E) Distribution of the changes of factors.
The diseases in MloDisDB were mapped to Disease Ontology database [40], Online Mendelian Inheritance in Man (OMIM) [41], Medical Subject Headings (MeSH) [42] and ICD-10 Clinical Modification [43] and were classified into four categories: (1) nervous system diseases, (2) cancer, (3) other diseases like infectious diseases and anemia and (4) biological processes like apoptosis and aging. The last category means that the MLO was important in the biological process, but the original publication did not show direct link with disease. As shown in Figure 2C, nervous system diseases and cancer accounted for 325 and 250 entries, respectively; they were the most widely studied MLO- and LLPS-related diseases [22]. A key feature in nervous system diseases is the abnormal aggregation of proteins. A number of studies have proven that aberrant LLPS and liquid-to-solid transitions of MLOs are involved in the formation of pathological protein aggregates [7, 9, 22, 44]. Many nervous system diseases such as amyotrophic lateral sclerosis, Alzheimer’s disease, frontotemporal dementia and spinal muscular atrophy are shown to relate to MLOs or LLPS in many entries in MloDisDB. Cancers are complex diseases with many different genetic causes; recent studies suggested that MLOs and LLPS may play important roles in tumorigenesis and progression via regulating cellular signaling and transcription activation [7]. Breast cancer, prostate cancer and ovarian cancer are the top three cancers in MloDisDB.
The factors that function in the MLO–disease relations and LLPS–disease relations include proteins, RNAs and others. As shown in Figure 2D, proteins accounted for 80% of all entries, including many well-studied LLPS proteins, such as TDP-43 [45], FUS [46], C9orf72 [47], HNRNPA1 [4] and PSD-95 [48]. RNAs are important scaffolds for many MLOs [1], and 27 RNA-related entries were included in MloDisDB. LLPS is a concentration-dependent process [49]; therefore, the expression levels of certain components can affect the assembly of MLOs. Besides, mutations and PTMs can change the charge, hydrophobicity, size and structure of the proteins, which may impact the assembly of LLPS and MLOs by altering the intermolecular contacts [50]. Thus, 230 expression changes, 197 mutations and 47 PTMs of the functional factors were extracted and included in MloDisDB (Figure 2E).
Each entry was assigned with one of the three evidence levels based on original publication. (1) ‘Direct experiment’: the abnormal of the MLO or factor causes typical symptoms of the disease in model organisms or cell lines. (2) ‘Indirect experiment’: the abnormal of the MLO or factor brings about certain changes, which are indicative of the development of the disease; or the original publication focuses on a known disease-causing factor, and the disease-related changes of the factor perturb the MLO. (3) ‘Clinical Investigation’: the relation is extracted from clinical samples investigation or drug usage investigation.
Several related resources, including the components of MLOs [27–31] and known LLPS proteins [14–17], were integrated in MloDisDB as well. Though LLPS is underlying the formation of MLOs, not all factors in MloDisDB have been reported to be LLPS proteins. Sequence features such as RNA-binding domains [51], prion-like domains [52] and pi–pi contacts [33] have been reported to facilitate LLPS, and several predictors have been developed to predict the LLPS ability based on these features. Therefore, LLPS-related predictions [32–39] were provided to give a more comprehensive understanding of the protein factors.
Analyses of MloDisDB factors
The functional factors in MloDisDB play important regulatory roles in the MLO–disease relations or LLPS–disease relations. Gene ontology and DisGeNET enrichment [53] were conducted to systematically analyze the functions and disease connections of protein and RNA factors. As shown in Figure 3A, MloDisDB factors mainly regulated RNA and protein localization and transport-related biological processes, which are among the most important functions of MLOs. As shown in Figure 3B, MloDisDB factors were strongly associated with nervous system diseases such as amyotrophic lateral sclerosis and anxiety, which is consistent with related reviews [7, 22, 54] and the disease proportion in our database.

Enrichment analyses and LLPS-related predictions of MloDisDB factors. (A) Gene ontology biological process enrichment analysis of MloDisDB factors. (B) DisGeNET enrichment analysis of MloDisDB factors. (C) Venn diagram of known LLPS proteins and MloDisDB protein factors. (D) Comparation of the LLPS-related predictions between MloDisDB protein factors and human proteome, MloDisDB protein factors were further classified based on whether the proteins were known LLPS proteins, the scores in the table were the mean values of the proteins. *Two-sided t-test P < 1e−2, **two-sided t-test P < 1e−4, ***two-sided t-test P < 1e−6. (E) MloDisDB protein factors which had the top 10 highest LLPS-related predictions and were not known LLPS proteins.

MloDisDB interface. (A) Home page: users can search according to different MLOs, factors and diseases here. (B) Browse page: users can browse all entries via combinatorial search. (C) Query result page: a responsive table of users’ search results. (D) Diseases page: all diseases are listed here. (E) MLOs page: all MLOs are displayed in a graphical navigation. (F) MLO detail page: basic information, related entries and components of the MLO are showed in this page. (G) Detailed page: detailed information for each entry.
Proteins account for the majority of MloDisDB factors, and many of them may form or regulate MLOs via LLPS. As shown in Figure 3C, 37 of 279 MloDisDB protein factors were known LLPS proteins [14–17] and the remaining 242 protein factors possessed significantly higher LLPS scores compared to human proteome (human UniProt reviewed proteins not including known LLPS proteins and MloDisDB protein factors) (Figure 3D). The significantly higher scores of CatGRANULE, PScore and DisoRDPbind indicated that MloDisDB protein factors were prone to form RNA granules, had more pi–pi interactions and possessed more interactions with other proteins.
The results in Figure 3D indicated that many of the remaining 242 protein factors might be candidate LLPS proteins. To measure the LLPS potential of these protein factors, scores of each LLPS-related predictors were quantile normalized, and the average of the normalized scores for each protein was used as the final score to find LLPS candidates. The candidates with the top 10 highest scores are listed in Figure 3E. Among these candidates, FBL has been proven to form protein clusters and then assemble into dense fibrillar component of nucleolus in cell in a recent study [55]; the study also proved that purified FBL formed phase-separated droplets in vitro. ULK1 has been proven to form pre-autophagosomal structure via LLPS in yeast in a recent paper published in Nature by Fujioka et al. [56]. Hrb98DE is the Drosophila melanogaster homolog of human hnRNPA2B1, and hnRNPA2B1 has been proven to undergo LLPS through the low-complexity domain [57]. NUP214 is a part of the nuclear pore complex whose formation is dependent on LLPS [58], and Xenopus laevis nup214 has been proven to form hydrogels through FG domains [59]. CDKL5 is a cyclin-dependent kinase function in mitosis when many structures assemble or disassemble via LLPS [60, 61]. Chd1 and CHD2 are chromatin remodelers who regulate the positioning of nucleosomes [62]; previous study has shown that 10n spacing of nucleosomes strongly favors LLPS of chromatin [63]; therefore, they may regulate LLPS of chromatin via rearrangement of nucleosomes [64]. OTUD4 is an RNA-binding protein and localizes in stress granules [65, 66], which strongly suggests that OTUD4 can undergo LLPS.
The web application
A freely available and full functional website has been developed to access the collection (Figure 4). The website comprises six sections: ‘Home’, ‘Browse’, ‘Diseases’, ‘MLOs’, ‘About’ and ‘Download’. Users can search or browse all data in MloDisDB according to different MLOs, factors, diseases in ‘Home’ (Figure 4A) or ‘Browse’ (Figure 4B) page. The query results are presented as a responsible table, which contains MLOs, factors, diseases and entries groups (Figure 4C). All diseases were listed in `Disease' page, users can find a disease via human body systems, each disease can be clicked to search related entries (Figure 4D). ‘MLOs’ page provides a user-friendly graphical navigation that enables users to browse based on the location of MLOs (Figure 4E); users can click each MLO to see the related entries and the components of the MLOs in ‘MLO detail’ page (Figure 4F). Users can click ‘LLPS’ in ‘MLOs’ page (Figure 4E) to see LLPS–disease entries and known LLPS proteins.
The unique MloDisDB ID (MDID) in ‘Browse’ page (Figure 4B), ‘Query result’ page (Figure 4C) and ‘MLO detail’ page (Figure 4F) can be clicked to navigate to detailed page (Figure 4G), which includes four sections: (1) Basic information, which contains the MLO, factor and disease information. (2) The changes of the MLO and the description of the changes, the description of the relation, the organisms and cell lines (or tissues) used in the experiment. (3) The changes of the factor which relate to the MLO-disease or LLPS-disease relations, the description of the changes. (4) LLPS-related predictions, which were displayed in an interactive and scalable interface created by neXtProt feature viewer [67], and plotly, users can easily find the regions with high LLPS scores and get the corresponding amino acids sequences of the regions of interest.
User guide and data summary are described in ‘About’ page. All data in MloDisDB can be freely downloaded in ‘Download’ page.
Discussion
Dysregulation of MLOs and LLPS has been shown to relate to a number of diseases, especially nervous system diseases and cancer [7]; in response to this, some LLPS-targeted therapeutic methods have been proposed [44, 68, 69]. MloDisDB is the first literature-based database of MLO–disease and LLPS–disease relations. With detailed information for MLOs, diseases and the functional factors, MloDisDB can be used to explore the assembly and regulatory mechanism of MLOs, as well as the pathogenesis and treatment of related diseases. A number of protein factors in MloDisDB are known LLPS proteins. Furthermore, many of the remaining factors got significantly higher scores by LLPS-related predictors and should be high-confidence LLPS candidates. Those factors whose LLPS behaviors affect the pathogenesis or progression of diseases can be targeted to explore LLPS-related therapeutic methods. We believe that MloDisDB will be useful to physicians and researchers working on MLOs, LLPS and diseases and will facilitate future studies in this field.
We developed MloDisDB, which contained 719 manually curated MLO–disease relations and 52 LLPS–disease relations from 607 publications.
Detailed information for all entries was extracted from original publications, such as cell lines used in experiments, the changes of MLOs, the functional factors and specific changes of the factors. The components of MLOs and LLPS related predictions were integrated in the database.
MloDisDB factors mainly regulated RNA and protein localization and transport and were strongly associated with nervous system diseases. Many of them had been proven to undergo LLPS, and a number of the others showed strong LLPS tendency.
A freely available and full functional website was built to search, browse and download all data in MloDisDB.
Acknowledgements
We thank Chunyu Yu (Peking University, China) and Kaiqiang You (Peking University, China) for helpful suggestions.
Funding
National Key Research and Development Program of China (2018YFA0507504); National Natural Science Foundation of China (61773025, 32070666).
Chao Hou is a PhD student at the Department of Biomedical Informatics, Peking University Health Science Center. His research interests involve bioinformatics, protein degeneration, transcriptional regulation and liquid–liquid phase separation.
Haotai Xie is an undergraduate student at Peking University Health Science Center.
Yang Fu is an undergraduate student at Peking University Health Science Center.
Yao Ma is an undergraduate student at Peking University Health Science Center.
Tingting Li is an associate professor at the Department of Biomedical Informatics, Peking University Health Science Center, Beijing, China. Her research interests involve bioinformatics, machine learning, protein post-translational modification and liquid–liquid phase separation.
References
Author notes
Chao Hou, Haotai Xie, Yang Fu and Yao Ma are equal contributors.