Giter Site home page Giter Site logo

y1zhou / metabolike Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 0.0 1.71 MB

Identifying reprogrammed metabolic routes given omics data.

Home Page: https://metabolike.readthedocs.io/

License: GNU General Public License v3.0

Python 99.74% Dockerfile 0.26%
bioinformatics metabolic-pathways graph-database

metabolike's Introduction

Welcome to metabolike ๐Ÿ‘‹

Metabolike logo

PyPI release Documentation status for stable version PyTest workflow status Code coverage CodeQL analysis Package license Code styled with Black GitHub commits since latest release (by date) for main branch

An alternative way to explore metabolic networks.

Metabolike lets you transform SBML metabolic models into queryable, interactive graphs.

Installation

pip install -U metabolike

For more details, please visit the documentation site.

Feature highlights

  • Parser that can read any valid SBML model and write relevant entities into a graph database.
  • Special support for BRENDA and BioCyc data files.
  • A Streamlit app for novel metabolic route detection with omics-data integration.

License

Metabolike is completely free and open-source and licensed under the GPL-3.0 license.

metabolike's People

Contributors

y1zhou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

metabolike's Issues

Information about citations in `pubs.dat`

  • Map ID to UNIQUE-ID in the file; PUB- is prefixed
  • List attributes: AUTHORS
  • String attributes: DOI-ID, MEDLINE-UID, PUBMED-ID, REFERENT-FRAME, SOURCE, TITLE, URL, YEAR

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Documentation

Currently all documentation are automatically-generated from the docstrings. The project should have standalone docs written to describe the workflow.

`classes.dat` contains the common name for a lot of the frames and IDs used in `pathways.dat`.

Anything not human-readable should have a corresponding UNIQUE-ID in classes.dat. For example, CCO-SURFACE-MAT is the cell surface matrix of CCO, and CCO is Cell Component Ontology.

  • Cellular compartment CCO
  • Common name for taxon, i.e. Organisms
  • Evidence code in citations Evidence. The mcId for Citation nodes is sometimes :-delimited and has this part (related to #15)
  • Gene-Ontology-Terms

Some reaction nodes are getting dropped

Related to #18 and c495b71. Unfortunately some reactions don't have a canonical ID in their multiple duplicates. For example, FUMHYDR-RXN[CCO-CYTOSOL]-MAL//FUM/WATER.28. and FUMHYDR-RXN[CCO-CYTOSOL]-MAL//FUM/WATER.28. both exist for humans, but FUMHYDR-RXN doesn't exist.

Link MetaCyc pathways to reactions

Unsure of the difference between reactions.dat and #reactions.dat#.

  • #8
  • Use pathways.dat to create annotation on Pathway nodes.
    • UNIQUE-ID are identifiers of the pathways, i.e. the BioCyc ID.
    • CITATIONS can have multiple entries and point to pubs.dat or Pathway Tools evidence ontology.
    • #11
    • IN-PATHWAY points to SuperPathway nodes. Seems to be the same with SUPER-PATHWAYS. Similarly for super pathways there's an SUB-PATHWAY attribute.
    • PATHWAY-LINKS are tuples with the first element being metabolites in the pathway, and other elements being IDs of other pathways.
    • PREDECESSORS connects reactions in the pathway. 2nd (and 3rd, etc.) element is the predecessor of the 1st element.
    • PRIMARY-PRODUCTS and PRIMARY-REACTANTS are beginning and ending metabolites.
    • REACTION-LAYOUT gives the primary metabolites and direction of each reaction.
    • REACTION-LIST enumerates the reaction IDs in the pathway.
    • SPECIES enumerates taxa known to possess this pathway.
    • SYNONYMS are names for the pathway.
    • TAXONOMIC-RANGE are the expected taxonomic range.
  • #13

Parse links in the `COMMENT` field

COMMENT corresponds to the "Summary" section on the website, and often spans multiple lines containing links to various other attributes, e.g. |CITS:[Hill99]|. Once all the other nodes are created, there should be a function that parses the COMMENT attribute and create relationships with other nodes.

Weight edges in route search

Now the graph search can only identify shortest paths, but to be useful we need to weigh the edges based on omics data (gene expression, deleterious mutation, etc.) and thermodynamics information (Gibbs free energy change of reaction).

Annotation of compounds

Compound nodes with (c:Compound)-[:is]->(rdf:RDF) relationships can be mapped to items in the compounds.dat file using:

  • rdf.Biocyc to file UNIQUE-ID (strip the META: prefix).
  • rdf.Inchikey to file INCHI-KEY (strip InChIKey= prefix in file).
  • Other identifiers in RDF to DBLINKS in file; this is harder to parse and should only be considered when the other two don't give matches.

Once we have a match, the following information could be added:

  • c node: GIBBS-0, LOGP, MOLECULAR-WEIGHT, MONOISOTOPIC-MW, POLAR-SURFACE-AREA, SYNONYMS, PKA[123], COMMENT
  • rdf node: SMILES, INCHI, DBLINKS
  • Links to other nodes: CITATIONS

Improve BRENDA load time

Using Dask to parallelize the Lark tree generation, but the multi-threaded client of map_partitions doesn't seem to speed up anything. Consider using multiple processes (simply switching causes weird bugs), or using @delayed function calls.

Duplicated canonical reactions

Some reactions have identical canonical reaction IDs, e.g. PGLUCISOM-RXN and PGLUCISOM-RXN-ALPHA-GLC-6-P//FRUCTOSE-6P.27.. This is caused by metabolic reactions with annotated substrates and/or cellular compartments in their IDs.

These are not the same reactions, but during the annotation step (

def reaction_to_graph(self, rxn_id: str, rxn_dat: Dict[str, List[List[str]]]):
) the two would get exactly the same information added. Is this the right thing to do?

Reaction ID mismatch

Some reactions have longer ID forms in metabolic-reactions.xml than in reactions.dat or pathways.dat. For example, F16BDEPHOS-RXN has two counterparts in metabolic-reactions.xml:

[x for x in all_rxns if x.startswith("F16BDEPHOS-RXN")]
# [
#    'F16BDEPHOS-RXN[CCO-PERI-BAC]-FRUCTOSE-16-DIPHOSPHATE/WATER//FRUCTOSE-6P/Pi.60.',
#    'F16BDEPHOS-RXN[CCO-CYTOSOL]-FRUCTOSE-16-DIPHOSPHATE/WATER//FRUCTOSE-6P/Pi.59.'
# ]

Reversibility mismatch in SBML and dat file

Here's the statistics for HumanCyc:

reactionDirection reversible COUNT(*)
"left_to_right" false 871
"physiol_left_to_right" false 1519
"physiol_right_to_left" false 172
"right_to_left" false 87
"reversible" true 239
"reversible" false 17
null true 146
"physiol_left_to_right" true 1

The reactionDirection field comes from reactions.dat, and reversible comes from the SBML file. To match with the results shown in HumanCyc:

  • When reactionDirection is "reversible" and reversible is false, the reactions are reverisble.
  • When reactionDirection is missing and reversible is true, the direction is unknown.
  • When reactionDirection is not reversible and reversible is true, the reaction is irreversible.

In summary, the reversible property is not reliable and can be dropped.

reaction side doesn't match in SBML and dat file

For example, in PEPDEPHOS-RXN pyruvate is listed in the products (right) side, but in the reactions.dat file it is on the left-hand-side. This renders reactionDirection useless, because we don't really know which metabolites are on which side.

[MetaCyc] GIBBS-0 and STD-REDUCTION-POTENTIAL in reversible reactions

When parsing the reactions.dat file, we're matching the entries to the nodes in the graph based on the UNIQUE-ID and metaId. However, the sides in reactions.dat and metabolic-reactions.xml could be swapped, i.e. LEFT in the dat file could be Product in the SBML file.

For irreversible reactions, this is fixed in MetacycClient.fix_reaction_direction where we always assume the directions in the SBML file are true. For reversible reactions, this becomes a problem as GIBBS-0 and STD-REDUCTION-POTENTIAL could be reversed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.