resendislab / corda Goto Github PK

An implementation of genome-scale model reconstruction using Cost Optimization Reaction Dependency Assessment by Schultz et. al

Home Page: http://resendislab.github.io/corda

License: MIT License

Python 99.28% Batchfile 0.72%

metabolism reconstruction cobra cobrapy modeling metabolic-models flux biochemistry systems-biology genome

corda's Introduction

CORDA for Python

This is a Python implementation based on the papers of Schultz et. al. with some added optimizations. It is based on the following two publiactions:

This Python package is developed in the Human Systems Biology Group of Prof. Osbaldo Resendis Antonio at the National Institute of Genomic Medicine Mexico and includes recent updates to the method (CORDA 2).

How to cite?

This particular implementation of CORDA has not been published so far. In the meantime you should if you cite the respective publications for the method mentioned above and provide a link to this GitHub repository.

What does it do?

CORDA, short for Cost Optimization Reaction Dependency Assessment is a method for the reconstruction of metabolic networks from a given reference model (a database of all known reactions) and a confidence mapping for reactions. It allows you to reconstruct metabolic models for tissues, patients or specific experimental conditions from a set of transcription or proteome measurements.

How do I install it

CORDA for Python works only for Python 3.4+ and requires cobrapy to work. After having a working Python installation with pip (Anaconda or Miniconda works fine here as well) you can install corda with pip

pip install corda

This will download and install cobrapy as well. I recommend using a version of pip that supports manylinux builds for faster installation (pip>=8.1).

For now the master branch is usually working and tested whereas all new features are kept in its own branch. To install from the master branch directly use

pip install https://github.com/resendislab/corda/archive/master.zip

What do I need to run it?

CORDA requires a base model including all reactions that could possibly included such as Recon 1/2 or HMR. You will also need gene expression or proteome data for our tissue/patient/experimental setting. This data has to be translated into 5 distinct classes: unknown (0), not expressed/present (-1), low confidence (1), medium confidence (2) and high confidence (3). CORDA will then ensure to include as many high confidence reactions as possible while minimizing the inclusion of absent (-1) reactions while maintaining a set of metabolic requirements.

How do I use it?

You can follow the [introduction](docs/index.ipynb).

What's the advantage over other reconstruction algorithms?

No commercial solver needed

It does not require any commercial solvers, in fact it works fastest with the free glpk solver that already comes together with cobrapy. For instance for the small central metabolism model (101 irreversible reactions) included together with CORDA the glpk version is a bout 3 times faster than the fastest tested commercial solver (cplex).

Fast reconstructions

CORDA for Python uses a strategy similar to FastFVA, where a previous solution basis is recycled repeatedly.

Some reference times for reconstructing the minimal growing models for iJO1366 (E. coli) and Recon3:

Python 3.10.8 (main, Oct 24 2022, 10:07:16) [GCC 12.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from cobra.io import load_model

In [2]: from corda import benchmark

In [3]: ecoli = load_model("iJO1366")
Restricted license - for non-production use only - expires 2023-10-25

In [4]: opt = benchmark(ecoli)
Running setup for model `iJO1366`.
Running CORDA setup... ✔ [0.479 s]
Running CORDA build... ✔ [7.44 s]
Running validation on reduced model... ✔ [0.448 s]

In [5]: print(opt)
build status: reconstruction complete
Inc. reactions: 447/2583
- unclear: 0/0
- exclude: 446/2582
- low and medium: 0/0
- high: 1/1


In [6]: recon3 = load_model("Recon3D")

In [7]: opt = benchmark(recon3)
Running setup for model `Recon3D`.
Running CORDA setup... ✔ [2 s]
Running CORDA build... ✔ [13.7 s]
Running validation on reduced model... ✔ [1.68 s]

In [8]: print(opt)
build status: reconstruction complete
Inc. reactions: 114/10600
- unclear: 0/0
- exclude: 113/10599
- low and medium: 0/0
- high: 1/1

corda's People

Contributors

Stargazers

Watchers

Forkers

cmptrx jotech migp11 anu-bioinfo hongzhonglu deepsystemspharmacology gibbons-lab

corda's Issues

Add changes from CORDA 2

Basically handling of redundancy changed.

Port to pytest

hide the internal model

Lots of confusion regarding CORDA.model. Better make it a private attribute so it is clear that this is not the reconstruction.

CS-Model keeps all the original metabolites

Hi,

I'm using CORDA to generate context-specific model based Recon2.2.

Recon2 dimesions: metabs=5324 rxns=7785
my_CSM: metabs=5324 rxns=4396

Thus my_CSM contains all the metabolites present in the original model.
In fact, if I execute the following line:

len([m for m in my_CSM.metabolites if len(m.reactions) == 0])

I get 2311 metabolites, which are not involved in any reaction, and thus, shoul not be part of the model

Hope it helps.
Best
Miguel

corda.build() doesn't end

Hi! First of all thanks a lot for this implementation!

I am using CORDA to generate several context-specific models from microarray data, one model for each of the 36 individuals in my GEO dataset. I am using Recon2.2 as template.

The problem is when I call CORDA.build(), it just goes on computing without ending.
No error messages whatsoever, I even left it run over the weekend but nothing happened.
What is worse is that the problem is not happening in a consistent way, as it presents itself only for some of the 36 individuals in the dataset (no apparent diffences in the data among individuals)

I tried to hunt down the problem myself, I think the program hangs when the *.associated() method is called, at line 276 of corda.py.

Sorry for the poor description of the issue, I couldn't find a way to reproduce it here.
I hope you can at least point me in the right direction to solve this issue
If you want I can share some code and the reaction confidence

Best Regards

Andrea

make compatible with symengine

Remove all sympy-specific code.

Blocked reactions with High Confidence Score

Hi,

I'm using CORDA to generate context-specific model based Recon2.2.

After generating a my CS-model, I found that its contains blocked reaction with High Confidence Score
which are not blocked in Recon2.2. This looks a bit buggi to me. I'm reading the original paper on corda to get a proper insight on the algorithm. Then I'll give a look at the code.

Hope to help
Best
Miguel

Easier install

The package should be installable with pip or conda (bioconda?).

accessing the corda solution

Great implementation, very useful thank you so much!
I'm wondering if it is possible to access the solution of corda algorithm directly?
As far as I understand the algorithm, corda is minimizing the flux through cost consuming reactions. Can I somehow get the flux values from this optimization?

Understading the difference between low and medium confidence

Hello,

Sorry if this is not the appropriate place to ask this. I am trying to understand if reactions classed as low confidence (1) are treated differently to reactions classed as medium (2) or if they are ultimately lumped together in the algorithm?

If they are lumped could they be separated so that they are treated differently (i.e. medium confidence preferentially inserted over low confidence)?

Thank you for your help,

Ben

CORDA2?

Love the python version of CORDA but any chance you're planning on updating it to CORDA2? Specifically include the mfACHR components as well as the CORDA2 updates. Cheers.

corda metrics should be obtainable for the reversible model as well

Currently print called for a CORDA optimizer instance will give infos in the following form:

build status: reconstruction complete
Inc. reactions: 2238/5351
 - unclear: 657/2485
 - exclude: 95/863
 - low and medium: 922/1345
 - high: 564/658

However, this is based on the irreversible models and does not consider that the reversible reaction is added along with all included reactions. So it would be better to change those metrics to the reversible case or allow for an option to specify the behavior.

Add docs

At least some minimal documentation on how to use it.

Gene Protein Reactions Rules are not consistent in the outputed model

Hi again,

I am creating some context-specific models (CSM) and then doing some in-silico single gene deletion experiments to find context-specific vulnerabilities. When I create a CSM I follow these steps:

Transform gene confidences into reactions confidences:
reactions_confidences = create_reactions_confidences(model, gene_confidences)
Run CORDA
corda = CORDA(model, reactions_confidences)
corda.build()
Finally, create the CSM
cs_model = corda.cobra_model()

The problem I've found is that the reactions included in the CSM inherit the Gene-Reaction-Protein (GPR) of the original model and thus include genes with low confidences that should not be in the model. Then, when the single gene KO experiments are conducted, some genes that should be predicted as essential in the CSM, are not. I've found that this happens because the model include other genes that code for the same reactions. For instance, the GPR of the enolase (ENO) is (HGNC:3350 or HGNC:3354 or HGNC:3353) and my gene confidences are {"HGNC:3350": 3, "HGNC:3353": -1, "HGNC:3354": -1}, then ENO should be present in the model but I guess the GPR should only include genes with high confidence, and some genes with low confidences only if they are required to enable flux through ENO. In the above example the ENO GPR in the CSM should be (HGNC:3350). In this way removing HGNC:3350 will imply a zero flux through ENO. What do you think?

I have a python-based implementation of Fastcore (I will put it in github in a near future) and I have the same problem. Thus, I trying to find a way to update the GPR after creating the CSM, so if I find some elegant way to solve this issue, I will share the idea with you.
I would like to hear your opinion on this issue.

Best regards,
Miguel