We performed a systematic analysis of more than 20,000 human diseases and ranked them based on their association with genes (
DisGeNet database
) that encode known droplet-forming proteins and with disease-associated missense mutations (
HuVarBase
) in known components of membraneless organelles. By comparing contributions from genes encoding condensate-forming proteins with those whose missense mutations affect native structures, we identified over 5000 human disorders, where protein condensation could be expected to have a causative nature. This list includes about 2000 orphan disorders linked with the dysfunction of multiple pathways.
We expect that the identification and understanding of the nature of protein condensation diseases that we report will promote the development of effective therapeutic strategies for their screening and treatment.
Ranking of human diseases based on their links with genes encoding
droplet-forming proteins.
9277 diseases from the curated resources (Sheet 1) and 21552 diseases from all resources
(Sheet 2) in the DisGeNet database were ranked by the fraction of disease-associated
genes, which encode droplet-forming proteins. The ranking evaluates the contribution of
experimental (MLO) and predicted (PC) droplet-forming proteins to the disease by comparing
it to those of non-condensate forming proteins. We also detail the orphan diseases (from
OrphaNet database
), which are associated with condensate-forming proteins (Sheet 3),
which may help identifying the potential disease mechanism.
Curated data (Sheet 1): The 5803 diseases in the table are taken from the 9277 diseases in
curated resources in the DisGeNet database, and ranked by the number of genes encoding
droplet-forming proteins. The sheet lists the disease-associated genes and encoded proteins
that are components of membraneless organelles (MLO) and proteins predicted to form
droplets (PC).
All data (Sheet 2): The 16393 diseases in the table are taken from the 21552 diseases in all
resources in the DisGeNet database and ranked by the fraction of genes encoding droplet-forming proteins.
Orphan diseases (Sheet 3): A list of rare (orphan) diseases with at least one third of the
contributing genes associated with protein condensation.
Disease-associated missense mutations in droplet-promoting regions of
experimentally identified droplet-forming proteins and diseases ranked by
the contribution of missense mutations in droplet-promoting regions.
Causality of condensate perturbations in protein condensation diseases is supported by
disease-associated missense mutations, which affect droplet-promoting regions (DPRs).
Droplet-promoting regions are prone to form disordered interactions and facilitate the
partitioning of proteins into condensates 1 . Thus we analyzed 644,521 disease-associated
missense mutations of 17450 human proteins in the Human Variants Database 3, for their
position in the protein sequence. We computed the fraction of missense mutations in droplet-
promoting regions Nmut(DRP)/Nmut(tot), and identified proteins, where > 70% of the missense
mutations are associated with droplet-promoting regions of experimentally identified
condensate-forming proteins (Sheet 1). Then we ranked the diseases, which are associated
with proteins in Sheet 1 by the fraction of disease-associated missense mutations of droplet-
forming proteins (Sheet 2). This ranking evaluates the contribution of missense mutations of
experimental (MLO) and predicted (PC) droplet-forming proteins to the disease by comparing
it to those of non-condensate forming proteins.
DPR mutations (Sheet 1): Proteins forming membraneless organelles (MLOs) with over
70% disease-associated missense mutations associated with droplet-regions. We also list
the diseases, which are associated with these proteins.
HuVarBase (Sheet 2): Diseases associated with proteins, where > 70% of missense
mutations are in droplet-promoting regions. Diseases were ranked by the fraction of disease-associated missense mutations of droplet-forming proteins as compared to those of non-condensate forming proteins.