Giter Site home page Giter Site logo

robert-haas / awesome-biomedical-knowledge-graphs Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 948 KB

A curated list of biomedical knowledge graphs and of resources for their construction.

Home Page: https://robert-haas.github.io/awesome-biomedical-knowledge-graphs

License: Creative Commons Attribution Share Alike 4.0 International

Makefile 0.07% TeX 58.86% Python 1.91% JavaScript 7.69% HTML 31.46%
awesome awesome-list biomedical-knowledge-graph knowledge-graph

awesome-biomedical-knowledge-graphs's Introduction

Awesome biomedical knowledge graphs Awesome

A curated list of biomedical knowledge graphs and of resources for their construction.

logo

This repository is inspired by awesome lists and follows the style guide of the awesome manifesto.

Table of contents

Introduction

The following information was generated by 1) getting a broad overview of academic and commercial projects that provide knowledge graphs in the domain of biomedicine as well as resources for creating them and 2) narrowing them down to a small subset that I consider awesome due to the quality or relevance of their provided results. I hope both collections serve you well! If you have suggestions or find an error, please don't hesitate to contact me or to contribute directly with a pull request.

Survey

A PDF report and accompanying website were created to present a comprehensive overview of available biomedical knowledge graphs and of resources for their construction.

Curated list

A carefully selected subset of the survey's entries are presented here in the style of an awesome list.

Biomedical knowledge graphs

  • Biomedical Data TranslatorPublication (2022), Website, Code, API, Demo

    • Content:
      • A collection of harmonized APIs
    • Scope:
      • "integrated data from over 250 knowledge sources, each exposed via open application programming interfaces (APIs)"
      • "a diverse community of nearly 200 basic and clinical scientists, informaticians, ontologists, software developers, and practicing clinicians distributed over 11 teams and 28 institutions to form the Biomedical Data Translator Consortium"
    • Goals:
      • "integrate as many datasets as possible, using a ‘knowledge graph’–based architecture, and allow them to be cross-queried and reasoned over by translational researchers"
      • "integrating existing biomedical data sets and “translating” those data into insights intended to augment human reasoning and accelerate translational science"
      • "promote serendipitous discovery and augment human reasoning in a variety of disease spaces"
      • "federate autonomous reasoning agents and knowledge providers within a distributed system for answering translational questions"
    • Sub-projects that construct knowledge graphs:
  • BiotequePublication (2022), Website, Code, Data

    • Content:
      • 450,000 nodes of 12 types
      • 30 million edges of 67 types
      • Extracted from 150 data sources
      • Provided as triples in multiple TSV files
    • Scope:
      • "a resource of unprecedented size and scope that contains pre-calculated embeddings derived from a gigantic heterogeneous network"
      • "Bioteque embeddings retain the information contained in the large biological network"
    • Goals:
      • "make biomedical knowledge embeddings available to the broad scientific community"
      • "evaluate, characterize and predict a wide set of experimental observations"
      • "assessment of high-throughput protein-protein interactome data"
      • "prediction of drug response and new repurposing opportunities"
  • CKGPublication (2023), Website, Code, Data

    • Full name: Clinical Knowledge Graph
    • Content:
      • 20 million nodes
      • 220 million edges
      • Extracted from 26 databases, 10 ontologies, 7 million publications
      • Provided as Neo4j graph database
    • Scope:
      • "prior knowledge, experimental data and de-identified clinical patient information"
      • "harmonization of proteomics with other omics data while integrating the relevant biomedical databases and text extracted from scientific publications"
    • Goals:
      • "inform clinical decision-making"
      • "reveal candidate markers of prognosis and/or treatment"
      • "generate new hypotheses that ultimately translate into clinically actionable results"
      • "clinically meaningful queries and advanced statistical analyses"
      • "liver disease biomarker discovery"
      • "multi-proteomics data integration for cancer biomarker discovery and validation"
      • "prioritize treatment options for chemorefractory cases"
  • HALDPublication (2023), Website, Code, Data

    • Full name: Human Aging and Longevity Dataset
    • Content:
      • 12,227 nodes of 10 types
      • 115,522 edges of various types
      • Extracted from 339,918 biomedical articles in PubMed
      • Provided as triples with additional information in multiple JSON and CSV files
    • Scope:
      • "a text mining-based human aging and longevity dataset of the biomedical knowledge graph from all published literature related to human aging and longevity in PubMed"
    • Goals:
      • "precision gerontology and geroscience analyses"
      • "provide predictions regarding the individuals’ lifespan under various treatment scenarios"
      • "devise novel, biologically-driven therapeutic and preventive strategies that address fundamental aging mechanisms"
  • Monarch KGPublication (2024), Website, Code, Data

    • Naming explanation: "The name ’Monarch Initiative’ was chosen because it is a community effort to create paths for diverse data to be put to use for disease discovery, not unlike the navigation routes that a monarch butterfly would take."
    • Content:
      • 862,115 nodes of 88 types
      • 11,412,471 edges of 23 types
      • Extracted from 33 biomedical resources and biomedical ontologies and "updated with the latest data from each source once a month"
      • Provided in various formats such as SQLite, Neo4J, RDF, KGX
    • Scope:
      • "Monarch App includes an ETL platform for ingesting, harmonizing, and serving diverse life science data relating genes, phenotypes, and diseases into a semantic KG for use in various downstream applications"
      • "Monarch KG integrates gene, disease, and phenotype data"
      • "Monarch Assistant, which will combine the ability of LLMs to answer questions in plain language with Monarch’s extensive KG and analysis algorithms"
    • Goals:
      • "learn different things about the relationship between genotype and phenotype from different organisms"
      • "collect, integrate, and make a broad compendium of species and sources computable"
  • OREGANOPublication (2023), Code, Data

    • Content:
      • 88,937 nodes of 11 types
      • 824,231 edges of 19 types
      • Extracted from various drug, protein and phenotype databases
      • Provided as triples in a TSV file
    • Scope:
      • "a holistically constructed knowledge graph using the broadest possible features and drug characteristics"
      • "integration of natural compounds (i.e. herbal and plant remedies)"
      • "incorporating together disease and drug information and natural compounds"
    • Goals:
      • "computational drug repositioning"
      • "generate hypotheses (molecule/drug - target links) through link prediction"
      • "from the available data, determine whether a drug is potentially capable of binding to a new target"
      • "identify possible repositionable molecules using machine learning (or more specifically deep learning) algorithms"
  • PharMeBINetPublication (2022), Website, Code, Data

    • Full name: Pharmacological Medical Biochemical Network
    • Content:
      • 2,869,407 nodes of 66 types
      • 15,883,653 edges of 208 types
      • Extracted from 48 data sources
      • Provided as Neo4j graph database and GraphML file
    • Scope:
      • "heterogeneous information on drugs, ADRs, genes, proteins, gene variants, and diseases"
    • Goals:
      • "analysis of ADRs [Adverse Drug Reactions]"
      • "analysis of possible existing connections between gene variants and drugs"
  • PrimeKGPublication (2023), Website, Code, Data

    • Full name: Precision Medicine Knowledge Graph
    • Content:
      • 129,375 nodes of 10 types
      • 4,050,249 edges of 30 types
      • Extracted from 20 data sources
      • Provided as triples in a CSV file
    • Scope:
      • "ten major biological scales, including disease-associated protein perturbations, biological processes and pathways, anatomical and phenotypic scales, and the entire range of approved drugs with their therapeutic action"
      • "improves on coverage of diseases, both rare and common, by one-to-two orders of magnitude compared to existing knowledge graphs"
    • Goals:
      • "support research in precision medicine"
      • "linking biomedical knowledge to patient-level health information"
      • "personalized diagnostic strategies and targeted treatments"
      • "providing a holistic and multimodal view of diseases"
  • SPOKEPublication (2023), Website, Code, API

    • Full name: Scalable Precision Medicine Open Knowledge Engine
    • Content:
      • 27,056,367 nodes of 21 types
      • 53,264,489 edges of 55 types
      • Extracted from 41 databases
      • Provided as a REST API that accepts graph queries, but "not available as a bulk download"
    • Scope:
      • "ranging from molecular and cellular biology to pharmacology and clinical practice"
      • "focuses on experimentally determined information"
      • "computational predictions and text mining from the literature are not currently prioritized"
    • Goals:
      • "applications relevant to precision medicine"
      • "provide insights into the understanding of diseases, discovering of drugs and proactively improving personal health"
      • "drug repurposing"
      • "disease prediction and interpretation of transcriptomic data"
      • "predict diagnosis"
      • "predict biomedical outcomes in a biologically meaningful manner"
  • SynLethKGPublication (2021), Website, Code, Data

    • Full name: Synthetic Lethality Knowledge Graph
    • Content:
      • 54,012 nodes of 11 types
      • 2,231,921 edges of 24 types
      • Extracted from SynLethDB and various gene, drug and compound databases
      • Provided as triples in a CSV file
    • Scope:
      • "genes, compounds, diseases, biological processes and 24 kinds of relationships that could be pertinent to SL"
    • Goals:
      • "identify SL gene pairs"
      • "discovery of anti-cancer drug targets"

Tools

  • BioCypherPublication (2023), Website, GitHub, PyPI

    • Scope:
      • "a Python library that provides a low-code access point to data processing and ontology manipulation"
      • "a modular architecture that maximizes reuse of data and code in three ways: input, ontology and output"
      • "adhere to FAIR (Findable, Accessible, Interoperable and Reusable) and TRUST (Transparency, Responsibility, User focus, Sustainability and Technology) principles"
    • Goals:
      • "make the process of creating a biomedical knowledge graph easier than ever, but still flexible and transparent"
      • "abstracting the KG build process as a combination of modular input adapters"
      • "provides easy access to state-of-the-art KGs to the average biomedical researcher"
      • "creating a more interoperable biomedical research community"
  • KGXWebsite, GitHub, PyPI

    • Scope:
      • "a Python library and set of command line utilities"
      • "The core datamodel is a Property Graph (PG), represented internally in Python using a networkx MultiDiGraph model."
    • Goals:
      • "exchanging Knowledge Graphs (KGs) that conform to or are aligned to the Biolink Model"
      • "provide validation, to ensure the KGs are conformant to the Biolink Model"

Databases

Ontologies and controlled vocabularies

  • Collections

  • Biolink ModelPublication (2022) Website Code

    • Scope:
      • "a unified data model that bridges across multiple ontologies, schemas, and data models"
      • "a map for bringing together data from different sources under one unified model, and as a bridge between ontological domains"
    • Goals:
      • "supported easier integration and interoperability of biomedical KGs"
      • "supports translation, integration, and harmonization across knowledge sources"

File formats

awesome-biomedical-knowledge-graphs's People

Contributors

robert-haas avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.