modelseed / probmodelseed Goto Github PK

View Code? Open in Web Editor NEW

2.0 14.0 3.0 2.98 MB

License: Other

Perl 60.64% Python 1.60% JavaScript 0.28% Shell 0.04% Makefile 0.12% C++ 22.10% C 0.30% Ruby 4.73% Raku 10.21%

probmodelseed's Introduction

Repo for development of the ProbModelSEED service

SERVICE DEPENDENCIES: typecomp Workspace

SETUP

A workspace server must be up and running at the URL located in config file
make
if you want to run tests: make test
make deploy
fill in deploy.cfg and set KB_DEPLOYMENT_CONFIG appropriately
$TARGET/services/Workspace/start_service

If the server doesn't start up correctly, check /var/log/syslog for debugging information.

RUNNING SERVERS Dev server on twig: https://p3c.theseed.org/dev1/services/ProbModelSEED Production server on beech: https://p3.theseed.org/services/ProbModelSEED

probmodelseed's People

Contributors

Stargazers

Watchers

Forkers

mmundy42 samseaver bv-brc

probmodelseed's Issues

Need users' models from old modelseed?

I'm not really sure this is necessary if we had SBML import? It may alleviate some work we focused on that instead... Not sure.

Create model templates using extended biochemistry

Write script to build model template from source files
Write script to generate SOLR tables from source files
Create gram positive model template
Create gram negative model template
Create plant model template
Create archaea model template
Update modeling code to only use model template (should not directly use biochemistry)
Update modeling code to handle translation of compartments from compartment-free biochemistry to model templates (and support multi-character compartment IDs?)

Add probabilistic annotation algorithm

Here's the design:

Add a ProbAnnotationWorker class that implements the algorithm. The class has methods that correspond to the steps in the algorithm as documented in the probabilistic annotation paper. Temporary files are stored in a separate job directory.
Add a ProbAnnotationParser class to access the static database files used by the algorithm. The static database files can either be downloaded from Shock or preloaded on the system. Note that creating the static database files is still done by the probabilistic_annotation service.
Add a ms-probanno script that retrieves the genome from the workspace, runs the algorithm, and stores the rxnprobs in the workspace.
Add a probanno parameter to the ModelReconstruction() method. When the probanno parameter is non-zero, store the genome in the model folder so it can be used as input to the probabilistic annotation and run the ms-probanno script.
Update the FBAModel object to include a rxnprobs_ref attribute. When the ms-probanno script is successful, the rxnprobs_ref is set to the rxnprobs object stored in the model folder.
Update the GapfillModel() method to pay attention to the probanno parameter. When the probanno parameter is non-zero, the rxnprobs are passed along to the MFAToolkit by building a objective coefficient file. This a direct port of the previous code.

Need user's rast annotated genomes

PATRICStore cache is not limited in size

It looks like the cache in the PATRICStore object is allowed to grow to unlimited size. This isn't a problem when running as an app since everything goes away when the app ends. But when running in the server, the cache could grow quite large. For example, the get_model() method proposed in issue #22 will need to get model objects from the workspace. When lots of model objects are cached, the memory usage of the server could get out of control.

param lists

I'm following this spec (which may not be correct) to create html templates. https://github.com/olsonanl/app_service/blob/master/app_specs/FluxBalanceAnalysis.json

Are media_supplement, rxnko, and geneko truly strings? If so, could they be changed to lists at some point?

copy_model is overwriting

I'm thinking that copy_model should look at existing object names. If the same name already exists, it could create a model with name "my model (1)" or such.

Alternatively, the server could throw an error, and then the user would have to choose to overwrite the existing or name it whatever they want (with the original name already present in the input box).

I like the first option because the interaction is the fastest, but the patric workspace copy command works more like option 2. In any case, the copy_model command currently overwrites, which we probably don't want.

Thoughts?

client tests fail because model name is wrong

Test 5 is failing with an object not found error because the name of the model created in test 3 has a different name.

get_fba (FBA Table) API Method

This is somewhat similar to get_model. I'm looking at a prototype here to come up with this spec: http://coremodels.mcs.anl.gov/app/#/fba/janakakbase:ATPF_Succi_aerobic/core_83333.1.fba.153

This is an opportunity to include everybody, @janakagithub, @samseaver, @jplfaria, @mdejongh.

Based on the table prototype above, we have,

{reaction_fluxes: [], exchange_fluxes, genes: [], biomass: []}

Each reaction_flux object is of the form:

{
    "name": "Carbonic acid hydro-lyase",
    "id": "rxn00102_c0",
    "flux": "0”,
    "min": -1000,
    "max": 1000,
    "lower_bound": -1000,
    "upper_bound": -1000,
    "class": 'unknown',              // I don't actually know what this is. Anybody?
    "eq": "H+[c0] + H2CO3[c0] <=> H2O[c0] + CO2[c0]",
    "def": <same thing as equation but with ids for the compounds>,
    "genes": [
        "fig|487976.3.peg.3553",
        "fig|487976.3.peg.4228",
        "fig|487976.3.peg.2871"
    ]
}

Each exchange_flux object is of the form:

{
    "id": "cpd00001_e0”,
    "name": "H2O",
    "formula": "H2O",
    "exchange_flux": "=> H2O[e]",
    "min": -1000,
    "max": 1000,
    "lower_bound": -1000,
    "upper_bound": -1000,
    "charge": 0,
}

A gene has the form:

{
    "id": "fig|29459.15.peg.1915”,
    "knocked_out": false,                   // This has never worked, as far as I know
    "growth_fraction": <some float?>
}

A compartment has the form:

{
    "id": "c0",
    "name": "Cytosol",
    "pH": 7,
    "potential": 0
}

And, biomass:

{
    "id": "bio1",
    "flux":  100,                               // Note: I don't see a value for this in core models
    "max_production",  100            // Note: I don't see a value for this in core models
    "cpd_id": "cpd00005_c0",
    "name": "NADPH",
    "coefficient": -1.8225,
}

media_ref disappeared

I'm liking the new keys in list_fbas and list_gapfills, but it looks like when I unintegrated a gapfill solution, the media_ref is removed.

[
    {
        "integrated_solution": -1,
        "rundate": "2015-07-24T21:09:57",
        "solution_reactions": [],
        "media_ref": null,
        "ref": "/[email protected]/home/models/.435.3_model/gapfilling/gf.1",
        "integrated": 0,
        "id": "gf.1"
    },
    {
        "integrated_solution": 0,
        "rundate": "2015-07-24T19:05:16",
        "solution_reactions": [ ... ],
        "media_ref": "/chenry/public/modelsupport/patric-media/Complete",
        "ref": "/[email protected]/home/models/.435.3_model/gapfilling/gf.0",
        "integrated": 1,
        "id": "gf.0"
    }
]

list_fba_studies throws error when model has no associated fba studies

Request:

{"version":"1.1","method":"ProbModelSEED.list_fba_studies","id":"04267241479828954","params":[{"model":"/nconrad/models/Rhodobacter_sphaeroides_2.4.1.json_model"}]}:

Response:

{"version":"1.1","error":{"error":"Can't use an undefined value as an ARRAY reference at /disks/p3/deployment/lib/Bio/ModelSEED/ProbModelSEED/ProbModelSEEDImpl.pm line 116.\n","name":"JSONRPCError","code":-32603,"message":"Can't use an undefined value as an ARRAY reference"},"id":"04267241479828954"}

This should probably return an empty list. When there are associated FBA studies, the method is working.

addCompoundFromHash() method in Biochemistry.pm does not support pkas and pkbs

The addCompoundFromHash() method does not support pkas and pkbs input arguments so there is no way to set them when adding a compound to a Biochemistry object.

delete_model() returns a list of ObjectMeta but spec defines just ObjectMeta

Since delete_model() takes a single model as input it makes sense that the output is just a single ObjectMeta. I'll update the method to return the correct data to match the spec.

Need published and reference models

list_models is quite slow

Putting a ticket in for this so it's not forgotten.

Reconstruction failing for particular genome

This may be related to the other failing jobs, not sure. I reconstructed "Bacillus amyloliquefaciens subsp. plantarum” on complete media and it failed the error below. The app service job id was 829a6e2f-aedc-4348-b40e-d025f8a03372.

keys on reference is experimental at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 905.
Use of uninitialized value in split at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3097.
Use of uninitialized value in concatenation (.) or string at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3098.
Use of uninitialized value in split at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3099.
Use of uninitialized value in concatenation (.) or string at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3100.
Use of uninitialized value in split at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3101.
Use of uninitialized value in concatenation (.) or string at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3102.
Use of uninitialized value in split at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3108.
Use of uninitialized value in addition (+) at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3151.
Use of uninitialized value in split at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3152.
Use of uninitialized value in addition (+) at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3153.
Use of uninitialized value in split at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3154.
Use of uninitialized value in addition (+) at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3155.
Use of uninitialized value in split at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3156.
Use of uninitialized value in addition (+) at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3157.
Use of uninitialized value in split at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3158.
Use of uninitialized value in addition (+) at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3159.
Use of uninitialized value in split at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3160.
Use of uninitialized value in split at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3182.
Use of uninitialized value in split at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3192.
Use of uninitialized value in split at /disks/p3/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBA.pm line 3199.
...

Array values in addReactionFromHash() and addCompoundFromHash()

In Bio::KBase::ObjectAPI::KBaseBiochem::Biochemistry there are addReactionFromHash() and addCompoundFromHash() methods for adding a single reaction or compound from an input hash. Several of the elements in the hash are arrays which I don't understand. In addReactionFromHash(), names, id, direction, deltag, and deltagerr are arrays of values. In addCompoundFromHash(), names, id, mergeto, formula, unchargedFormula, mass, charge, deltag, are deltagerr are arrays of values. Why are those elements arrays when adding a single reaction or compound?

Gram positive and gram negative model templates are swapped

I dumped the /chenry/public/modelsupport/templates/GramNegative.modeltemplate and /chenry/public/modelsupport/templates/GramPositive.modeltemplate files and it looks like they are swapped.

In GramNegative.modeltemplate, there are these fields:

"name": "GramPosModelTemplate",
        "name": "GramPositiveBiomass",

And in GramPositive.modeltemplate there are these fields:

"name": "GramNegModelTemplate",
        "name": "GramNegativeBiomass",

The model templates are definitely different in terms of reactions. @cshenry, were the file names swapped?

Gapfills should always start from the same base model

As discussed at 6/29 meeting, GapfillModel() should always start from a model without other gap fills integrated. This makes each gap fill (say on different media) start from the same base model. For example, if a user submitted 5 gap fills on 5 different media it wouldn't make any difference if one of the gap fills completed before another gap fill started.

One thing we didn't discuss is how to handle model edits. Should model edits be integrated into the model before running a gap fill?

Implementing this change is a little tricky. When a model is retrieved using the get_objects() method in PATRICStore, the transform_model_from_ws() method() is called implicitly. The transform_model_from_ws() method always processes gapfilling objects that it finds in the workspace. Should we add a new option to get_objects() to control how a model is transformed? The option would then have to passed through to all of the transform_XXX methods that are defined.

Differences between media and patric-media folder in workspace

What is the difference between /chenry/public/modelsupport/media and /chenry/public/modelsupport/patric-media? The media folder matches what is shown on the website in the "Public Media" section of the Biochemistry tab. But the default in the ProbModelSEED server is to use the patric-media folder.

Here's what is in the patric-media folder:

> ws-ls /chenry/public/modelsupport/patric-media
Name     Owner  Type  Moddate             Size User perm Global perm
LB       chenry media 2015-06-12T04:34:10 0    r         r          
NMS      chenry media 2015-06-12T04:34:34 0    r         r          
SP4      chenry media 2015-06-12T04:35:08 0    r         r          
7H9      chenry media 2015-06-12T04:38:22 0    r         r          
Complete chenry media 2015-06-12T04:38:32 0    r         r          
GMM      chenry media 2015-06-12T04:38:55 0    r         r

mfatoolkit config variables not handled correctly

The constructor in the ProbModelSEEDHelper module does not handle the fbajobcache, fbajobdir, and mfatoolkitbin config variables. The corresponding function in the KBase::ObjectAPI::utilities module needs to be called so the values are passed along to the KBase::ObjectAPI modules.

I have a question about the fbajobcache config variable. In ProbModelSEEDImpl, the variable is required to be set in the configuration. But in KBase::ObjectAPI::KBaseFBA::FBA.pm the runFBA() method checks to see if the variable is set before saving the fba directory to a zip file. It seems that always saving the directory could cause problems with lots of data being saved in the long run.

Should the fbajobcache config variable be optional instead of required?

Lists in ModelStats structure are not returned by list_models()

In the spec file, a ModelStats structure includes four lists:

list<string> biomasses;
list<string> reactions;
list<string> genes;
list<string> biomasscpds;

But in the output ModelStats structures returned by list_models() the four lists are not there. Should the lists be returned? Or should they be removed from the structure?

get_model (Model Table) API Method

From email:

Looking at the model data, it would be something like this:

{reactions: Array[1051], compounds: Array[1103], genes: Array[1956], compartments: Array[2], biomass: Array[100]}

Where each reaction object (in the array) is of the form:

{
    "name": "L-Threonine acetaldehyde-lyase",
    "id": "rxn00541_c0",
    "eq": "L-Threonine[c0] <=> Acetaldehyde[c0] + Glycine[c0]”,
    "def": <same thing as equation but with ids for the compounds>,
    "gpr": "fig|487976.3.peg.3553 and (fig|487976.3.peg.4228 or fig|487976.3.peg.2871)",
    "evidence": <some string>,
    "gapfill_dir": true,
    "gapfill_refs": [{id: "gap.3", direction: ">"}, 
                           {id: "gap.10", direction: "<"}, ... ],
    "genes": [
        "fig|487976.3.peg.3553",
        "fig|487976.3.peg.4228",
        "fig|487976.3.peg.2871"
    ]
}

Note: The format for equation and definition is debatable.

A compound has the form:

{
    "id": "cpd00113_c0”,
    "name": "Isopentenyldiphosphate",
    "formula": "C5H10O7P2",
    "charge": -2,
}

A gene has the form:

{
    "id": "fig|29459.15.peg.1915”,
    "rxns": [“rxn12008_c0”, … ,“rxn03892_c0" ]
}

A compartment has the form:

{
    "id": "c0",
    "name": "Cytosol",
    "pH": 7,
    "potential": 0
}

And, biomass:

{
    "id": "bio1",
    "cpdID": "cpd00084_c0",
    "name": "L-Cysteine",
    "coefficient": -0.0569540049395353,
    "compartment": "c0"
}

Curate model templates

Curate gram negative model template
Curate gram positive model template
Curate plant model template
Curate archaea model template

Default charge of reactions containing unknown structures

Due to the simple fact that we can, in theory, remove competing R groups from an equation, and thereafter balance the mass of the equation, we do have the problem, such as with fatty-acyl-ACPs, that we are still left with a charge imbalance (particularly if protons are added/removed to balance the mass).

The problem is exacerbated simply because changing the charge of any one compound to fix a single reaction may result in an imbalance of charge in another reaction. David Fell had an analogy for this, of trying to flatten a carpet, but getting the same bump to appear in different places on the floor.

Following on from the recent decision to default to a charge of zero for compounds with an unknown structure, we could make a decision concerning the charge balance of reactions that use such compounds. What I propose quite simply is that for reactions with unknown structures, that may still be mass balanced, we ignore the charge balance.

As such, for any reaction with the status of 'CI', I can check that it has R groups on both side of the equation, and update the status to be "OK|RG|CI' where RG stands for R group (or some other preferred code).

trailing space in autometadata reaction string

The reactions value looks something like this

"...rxn06937_c0/rxn09201_c0/rxn05429_c0/rxn02339_c0/rxn00776_c0/rxn05247_c0/rxn00184_c0/rxn02144_c0/rxn04954_c0/rxn00838_c0/rxn08551_c0/rxn02275_c0/rxn00849_c0/rxn05735_c0/rxn08817_c0/rxn01169_c0/rxn05223_c0/rxn00931_c0/rxn05163_c0/rxn00206_c0/rxn01457_c0/rxn09205_c0/rxn05425_c0/rxn01300_c0/rxn08820_c0/rxn08844_c0 "

Different genes IDs link to the same page

In the same viewer page, in the column "Genes" there are two IDs displayed, fig and PATRIC IDs, the issue is that both link to the same page PATRIC ID page.

list_models() and get_model() differences

list_models() returns the number of reactions and compounds that it got from the workspace list() function which is not accurate when there are integrated gap fills. get_model() returns the data from the integrated model so the length of the lists in the model_data structure is different.

The number of genes as calculated by the workspace list() function is different than the length of the list returned in the model_data structure. For genome 226186.12 list_models() says there are 1,159 genes but get_model() returns a list of 9,871 genes.

Also, there is no interface to get both model stats and model data in one function call. If you call list_models() to get model stats you'll get stats for all of the user's models. And if you call get_model() to get model data you don't get information like the name, genome, etc.

Can gapgens and gapfillings be removed from model object?

If I get a model object using the workspace get() method, the returned object data includes "gapgens" and "gapfillings" keys. Can those keys be removed since the mechanism for managing gapfillings and gapgens is different?

Tests need updates to be run by everyone

The tests in ProbModelSEEDTests.pm need to be updated so everyone can run them.

The genome /chenry/genomes/test/.Buchnera_aphidicola/Buchnera_aphidicola.genome should have public authority (or be moved to a public folder).
The user id in the paths in the tests should be configurable. It could just be a variable in the module that needs to be manually updated when running the tests.

Unrecognized reference format error

I'm trying to update a Model to use the genome that is stored in the workspace and I'm getting this error trying to get the model after the update:

JSONRPC error data:Unrecognized reference format:/reviewer/home/models/.PrivateGenomeModel/563178.138.genome at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/utilities.pm line 179.
Bio::KBase::ObjectAPI::utilities::error("Unrecognized reference format:/reviewer/home/models/.PrivateG"...) called at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/BaseObject.pm line 691
Bio::KBase::ObjectAPI::BaseObject::getLinkedObject(Bio::KBase::ObjectAPI::KBaseFBA::FBAModel=HASH(0x4b79e40), "/reviewer/home/models/.PrivateGenomeModel/563178.138.genome") called at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/DB/FBAModel.pm line 73
Bio::KBase::ObjectAPI::KBaseFBA::DB::FBAModel::_build_genome(Bio::KBase::ObjectAPI::KBaseFBA::FBAModel=HASH(0x4b79e40)) called at accessor Bio::KBase::ObjectAPI::KBaseFBA::DB::FBAModel::genome (defined at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/DB/FBAModel.pm line 55) line 12
Bio::KBase::ObjectAPI::KBaseFBA::DB::FBAModel::genome(Bio::KBase::ObjectAPI::KBaseFBA::FBAModel=HASH(0x4b79e40)) called at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/ModelSEED/ProbModelSEED/ProbModelSEEDHelper.pm line 362

Why is that an unrecognized reference format?

New user issue

"Something seems to have went wrong. Please try logging out and back in again."

I get this error message when I go to the "My Models" tab when I have no models. We should say instead something like "you have 0 models".
This may be the cause:
{"version":"1.1","error":{"error":"Can't use an undefined value as an ARRAY reference at /disks/p3/deployment/lib/Bio/ModelSEED/ProbModelSEED/ProbModelSEEDImpl.pm line 142.\n","name":"JSONRPCError","code":-32603,"message":"Can't use an undefined value as an ARRAY reference"},"id":"8817047739867121"}
Returned by the server.

Reaction status lost when running Update_Reaction_Status multiple times

I accidentally ran Update_Reaction_Status.pl twice in a row and noticed that the status value changed even though no updates were made to the reaction. For example, here are the relevant fields for rxn00001 after running Update_Reaction_Status the first time:

code: (1) cpd00001[0] + (1) cpd00012[0] <=> (2) cpd00009[0]
stoichiometry: -1:cpd00001:0:0:"H2O";-1:cpd00012:0:0:"PPi";2:cpd00009:0:0:"Phosphate";1:cpd00067:0:0:"H+"
equation: (1) cpd00001[0] + (1) cpd00012[0] <=> (2) cpd00009[0] + (1) cpd00067[0]
status: OK|HB

And if you run Update_Reaction_Status again, the status field is reset to "OK" because no changes were required to mass and charge balance the reaction:

code: (1) cpd00001[0] + (1) cpd00012[0] <=> (2) cpd00009[0]
stoichiometry:-1:cpd00001:0:0:"H2O";-1:cpd00012:0:0:"PPi";2:cpd00009:0:0:"Phosphate";1:cpd00067:0:0:"H+"
equation: (1) cpd00001[0] + (1) cpd00012[0] <=> (2) cpd00009[0] + (1) cpd00067[0]
status: OK

The status should not be lost when checking the reaction again. Fixing requires an update to the checkReactionMassChargeBalance() method in a Reaction object to start with the current status and an update to the Update_Reaction_Status script to set the status in the temporary Reaction object to the current value.

Issues with reconstruction on custom media

User tried run reconstruction with custom media (built from UI) and gapfilling failed.

Standard error:

Use of uninitialized value $Item in concatenation (.) or string at /disks/p3c/deployment/lib/Bio/KBase/ObjectAPI/utilities.pm line 332.
Type retrieved (string) does not match specified type (media)! at /disks/p3c/deployment/lib/Bio/KBase/ObjectAPI/utilities.pm line 183.
Bio::KBase::ObjectAPI::utilities::error("Type retrieved (string) does not match specified type (media)!") called at /disks/p3c/deployment/lib/Bio/ModelSEED/ProbModelSEED/ProbModelSEEDHelper.pm line 196
Bio::ModelSEED::ProbModelSEED::ProbModelSEEDHelper::error(Bio::ModelSEED::ProbModelSEED::ProbModelSEEDHelper=HASH(0x3335140), "Type retrieved (string) does not match specified type (media)!") called at /disks/p3c/deployment/lib/Bio/ModelSEED/ProbModelSEED/ProbModelSEEDHelper.pm line 41
...

Switch to new extended biochemistry

Validate master biochemistry files
Update script to build SOLR tables
Generate updated SOLR tables and deliver to Malik

Add gapfilled reactions to model on integrate

Sorry I forgot this ticket.

From @cshenry...
It has a few ramifications:
1.) To avoid race conditions, I only want to allow one gapfilling operation to run at a time. To enforce this, we'll need some sort of "locking" mechanism (which is always a pain, but necessary).
2.) Integration and unintegration of gapfillings will be much slower than they are now...
3.) "list_models" and "get_model" will be much much faster... and more reliable.

Need complete media data

I reported this in kbase too. No data is returned for /chenry/public/modelsupport/media/Complete. It's particulaly confusing since complete media is the default.

put gapfill reactions in model object on integrating

... workspace.get will be so much faster for displaying models.

list_models and list_fba_studies could use meta data.

This may already be in the plans, not sure. Right now these methods return a list of references. It would be great if these calls returned more info on the models and fba studies, similar to this page: http://coremodels.mcs.anl.gov/app/#/models/ (click on a model to see the associated fba studies)

So for models, this would be a list of objects, something like:

{
  orgName: "Ecoli",
  ref: "workspace/path/to/file",
  rxnCount: 100,
  cpdCount: 100
}

For FBAs, this would be a list of objects, something like:

{
  mediaRef: "media/ref/path",
  media: "media name",  // if needed? (if the name is different than the ref name?)
  objective: (objective_value === '10000000' ? 0 : objective_value),  // note this is an issue in KBase
  rxnCount: 50,
  cpdCount: 50,
  maximized: true,
  biomass: "Max ATP"
}

Model edits are not taken into account when retrieving a model from workspace

The transform_model_from_ws() function takes into gapfilling when retrieving a model from the workspace but model edits are not processed. See https://github.com/ModelSEED/ProbModelSEED/blob/master/lib/Bio/KBase/ObjectAPI/PATRICStore.pm#L403

get_object() always gets object metadata even when not needed

In the ProbModelSEEDHelper get_object() method, the metadata for the object is always retrieved. When the type parameter is undef, the metadata is not needed. Furthermore, the only part of the metadata that is needed is the type. Since the metadata can be quite large, there is extra overhead to return data that is unused.

Remove hard-coded paths to biochemistry and model template objects

In ProbModelSEEDHelper, there are hard-coded paths to the biochemistry, classifier, default media, and model template objects. To make it easier to test updated versions of the objects, I'm adding parameters to set the defaults and configuration variables to set them on deployments. Here are the defaults:

biochemistry=/chenry/public/modelsupport/biochemistry/default.biochem
default_media=/chenry/public/modelsupport/patric-media/Complete
classifier=/chenry/public/modelsupport/classifiers/gramclassifier.string
template_dir=/chenry/public/modelsupport/templates

odd empty hash returned by copy_model

Not a big deal, but with the new copy_model command, there is an odd empty hash in the return structure. The key to the empty hash is the model path (i.e., /nconrad/plantseed/models/Vvinifera-IGGP_12x_Model) that was sent to the server. I think you can remove this?

no param checking

It appears that runfba will still run no matter what is sent to it it? This is unfriendly and makes testing difficult.

Here I set 'minfluxsddf' to -23:

{"version":"1.1","method":"ProbModelSEED.FluxBalanceAnalysis","id":"0022221722174435854","params":[{"model":"/[email protected]/home/models/testgenome1.genome_model","media_supplement":[],"thermo_const_type":"None","minfluxsddf":-23}]}

Response:

{"version":"1.1","id":"0022221722174435854","result":[["fba.23","fba","/[email protected]/home/models/.testgenome1.genome_model/fba/","2015-06-04T23:08:47","A7854650-0B0E-11E5-9AEC-C3F8682E0674","[email protected]",15047,{"media":"/chenry/public/modelsupport/media/Complete","objective":0},{"inspection_started":"2015-06-04T23:08:47"},"o","n",""]]}

compartments in reaction/compound names

This is not critical. I noticed that there are compartments in the reaction and compound names. Just wondering if those are really needed?

Example: "Isoheptadecanoyllipoteichoic acid (n=24), linked, glucose substituted_c0"

object_refs have vertical bars

Just pointing out that there are odd ||'s in media_ref and the modelcompound_ref's for fba data. This doesn't affect my work.

I still wonder why this data needs to be returned as a string. We should fix that later on if possible.

{
    "meta": [
        "fba.23",
        "fba",
        "/[email protected]/home/models/.testgenome1.genome_model/fba/",
        "2015-06-04T23:08:47",
        "A7854650-0B0E-11E5-9AEC-C3F8682E0674",
        "[email protected]",
        15047,
        {
            "media": "/chenry/public/modelsupport/media/Complete",
            "objective": 0
        },
        {
            "objective_function": "Max bio1",
            "allReversible": 0,
            "findMinimalMedia": 0,
            "model": "/[email protected]/home/models/testgenome1.genome_model",
            "simpleThermoConstraints": 0,
            "objectiveValue": 0,
            "comboDeletions": 0,
            "media": "/chenry/public/modelsupport/media/Complete",
            "fva": 0,
            "objectiveConstraintFraction": 1,
            "fluxMinimization": 0,
            "no_production_biomass_compounds": "cpd00003_c0/cpd00006_c0/cpd00016_c0/cpd15500_c0/cpd00220_c0/cpd15723_c0/cpd00052_c0/cpd00038_c0/cpd02229_c0/cpd15665_c0/cpd15560_c0/cpd15352_c0/cpd00357_c0/cpd00087_c0/cpd00015_c0/cpd00345_c0/cpd15750_c0/cpd15768_c0/cpd00062_c0/cpd00201_c0/cpd15722_c0/cpd15794_c0/cpd15795_c0/cpd00356_c0/cpd15748_c0/cpd15766_c0/cpd00028_c0/cpd15696_c0/cpd15695_c0/cpd15533_c0/cpd15749_c0/cpd15767_c0/cpd15540_c0/cpd15793_c0/cpd00241_c0/cpd15669_c0/cpd15757_c0/cpd15775_c0/cpd11459_c0/cpd15758_c0/cpd15667_c0/cpd15668_c0/cpd15776_c0/cpd15777_c0/cpd15759_c0/cpd00063_c0",
            "id": "fba.23",
            "thermodynamicConstraints": 0
        },
        "o",
        "n",
        ""
    ],
    "data": {
        "compoundflux_objterms": {},
        "media_ref": "/chenry/public/modelsupport/media/Complete||",
        "FBAMetaboliteProductionResults": [
            {
                "modelcompound_ref": "/[email protected]/home/models/testgenome1.genome_model||/modelcompounds/id/cpd00161_c0",
                "maximumProduction": 100
            },
            {
                "modelcompound_ref": "/[email protected]/home/models/testgenome1.genome_model||/modelcompounds/id/cpd00033_c0",
                "maximumProduction": 100
            },
            {
                "modelcompound_ref": "/[email protected]/home/models/testgenome1.genome_model||/modelcompounds/id/cpd00002_c0",
                "maximumProduction": 100
            },
            {
                "modelcompound_ref": "/[email protected]/home/models/testgenome1.genome_model||/modelcompounds/id/cpd00010_c0",
                "maximumProduction": 100
            },
            {
                "modelcompound_ref": "/[email protected]/home/models/testgenome1.genome_model||/modelcompounds/id/cpd00051_c0",
                "maximumProduction": 100
            },
            {
                "modelcompound_ref": "/[email protected]/home/models/testgenome1.genome_model||/modelcompounds/id/cpd00001_c0",
                "maximumProduction": 100
            },
            {
                "modelcompound_ref": "/[email protected]/home/models/testgenome1.genome_model||/modelcompounds/id/cpd00003_c0",
                "maximumProduction": -1.13687e-13
            },
            {
                "modelcompound_ref": "/[email protected]/home/models/testgenome1.genome_model||/modelcompounds/id/cpd00060_c0",
                "maximumProduction": 100
            },
            {
                "modelcompound_ref": "/[email protected]/home/models/testgenome1.genome_model||/modelcompounds/id/cpd00035_c0",
                "maximumProduction": 100
            },
            {
                "modelcompound_ref": "/[email protected]/home/models/testgenome1.genome_model||/modelcompounds/id/cpd11493_c0",
                "maximumProduction": 100
            },
            {
                "modelcompound_ref": "/[email protected]/home/models/testgenome1.genome_model||/modelcompounds/id/cpd00006_c0",
                "maximumProduction": 6.82121e-13
            },
            {
                "modelcompound_ref": "/[email protected]/home/models/testgenome1.genome_model||/modelcompounds/id/cpd00016_c0",
                "maximumProduction": 0
            },
            {
                "modelcompound_ref": "/[email protected]/home/models/testgenome1.genome_model||/modelcompounds/id/cpd00156_c0",
                "maximumProduction": 100
            },
            {
                "modelcompound_ref": "/[email protected]/home/models/testgenome1.genome_model||/modelcompounds/id/cpd00254_c0",
                "maximumProduction": 100
            }
.
.
.

Unrecognized reference format: PATRICSOLR:216592.3/features/id/fig|216592.3.peg.3612

Tried running ModelReconstruction() and got this error:

biop3.ProbModelSEED.ProbModelSEEDClient.ServerError: JSONRPCError: -32603. Unrecognized reference format:PATRICSOLR:216592.3/features/id/fig|216592.3.peg.3612 at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/utilities.pm line 179.
    Bio::KBase::ObjectAPI::utilities::error("Unrecognized reference format:PATRICSOLR:216592.3/features/id"...) called at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/BaseObject.pm line 691
    Bio::KBase::ObjectAPI::BaseObject::getLinkedObject(Bio::KBase::ObjectAPI::KBaseFBA::ModelReactionProteinSubunit=HASH(0x20e9d868), "PATRICSOLR:216592.3/features/id/fig|216592.3.peg.3612") called at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/BaseObject.pm line 698
    Bio::KBase::ObjectAPI::BaseObject::getLinkedObjectArray(Bio::KBase::ObjectAPI::KBaseFBA::ModelReactionProteinSubunit=HASH(0x20e9d868), ARRAY(0xdb2e258)) called at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/DB/ModelReactionProteinSubunit.pm line 33
    Bio::KBase::ObjectAPI::KBaseFBA::DB::ModelReactionProteinSubunit::_build_features(Bio::KBase::ObjectAPI::KBaseFBA::ModelReactionProteinSubunit=HASH(0x20e9d868)) called at accessor Bio::KBase::ObjectAPI::KBaseFBA::DB::ModelReactionProteinSubunit::features (defined at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/DB/ModelReactionProteinSubunit.pm line 27) line 10
    Bio::KBase::ObjectAPI::KBaseFBA::DB::ModelReactionProteinSubunit::features(Bio::KBase::ObjectAPI::KBaseFBA::ModelReactionProteinSubunit=HASH(0x20e9d868)) called at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/ModelReactionProteinSubunit.pm line 25
    Bio::KBase::ObjectAPI::KBaseFBA::ModelReactionProteinSubunit::_buildgprString(Bio::KBase::ObjectAPI::KBaseFBA::ModelReactionProteinSubunit=HASH(0x20e9d868)) called at accessor Bio::KBase::ObjectAPI::KBaseFBA::ModelReactionProteinSubunit::gprString (defined at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/ModelReactionProteinSubunit.pm line 17) line 10
    Bio::KBase::ObjectAPI::KBaseFBA::ModelReactionProteinSubunit::gprString(Bio::KBase::ObjectAPI::KBaseFBA::ModelReactionProteinSubunit=HASH(0x20e9d868)) called at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/ModelReactionProtein.pm line 36
    Bio::KBase::ObjectAPI::KBaseFBA::ModelReactionProtein::_buildgprString(Bio::KBase::ObjectAPI::KBaseFBA::ModelReactionProtein=HASH(0x208f83a0)) called at accessor Bio::KBase::ObjectAPI::KBaseFBA::ModelReactionProtein::gprString (defined at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/ModelReactionProtein.pm line 17) line 10
    Bio::KBase::ObjectAPI::KBaseFBA::ModelReactionProtein::gprString(Bio::KBase::ObjectAPI::KBaseFBA::ModelReactionProtein=HASH(0x208f83a0)) called at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/ModelReaction.pm line 106
    Bio::KBase::ObjectAPI::KBaseFBA::ModelReaction::_buildgprString(Bio::KBase::ObjectAPI::KBaseFBA::ModelReaction=HASH(0x10da2d90)) called at accessor Bio::KBase::ObjectAPI::KBaseFBA::ModelReaction::gprString (defined at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/ModelReaction.pm line 25) line 10
    Bio::KBase::ObjectAPI::KBaseFBA::ModelReaction::gprString(Bio::KBase::ObjectAPI::KBaseFBA::ModelReaction=HASH(0x10da2d90)) called at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBAModel.pm line 985
    Bio::KBase::ObjectAPI::KBaseFBA::FBAModel::printSBML(Bio::KBase::ObjectAPI::KBaseFBA::FBAModel=HASH(0x208f8a78)) called at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/KBase/ObjectAPI/KBaseFBA/FBAModel.pm line 1465
    Bio::KBase::ObjectAPI::KBaseFBA::FBAModel::export(Bio::KBase::ObjectAPI::KBaseFBA::FBAModel=HASH(0x208f8a78), HASH(0x50d13a0)) called at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/ModelSEED/ProbModelSEED/ProbModelSEEDHelper.pm line 713
    Bio::ModelSEED::ProbModelSEED::ProbModelSEEDHelper::ModelReconstruction(Bio::ModelSEED::ProbModelSEED::ProbModelSEEDHelper=HASH(0x385adb8), HASH(0x37205e0)) called at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/ModelSEED/ProbModelSEED/ProbModelSEEDImpl.pm line 1985
    Bio::ModelSEED::ProbModelSEED::ProbModelSEEDImpl::ModelReconstruction(Bio::ModelSEED::ProbModelSEED::ProbModelSEEDImpl=HASH(0x3729e60), HASH(0x37205e0)) called at /data2/microbiome/software/patric/dev-20150521/deployment/lib/Bio/ModelSEED/ProbModelSEED/Service.pm line 361

The code to save the SBML version of the model to the workspace was added back in June and I could have sworn I've run ModelReconstruction() since then so I'm not sure what changed.

any media object additions for media editor?

@cshenry , @janakagithub, @jplfaria , @samseaver. It may be a good time to get all the data we need for editing media into the media objects. Can we have the compound names
added? I think it makes sense to have the names readily available, particularly if custom compounds will be supported? Anything else? Charge? Below is all the data I have right now.

Issues with copy_model

I noticed a few issues with copy_model:

If I run with 'copy_genome' true, I get this error.

{"version":"1.1","error":{"error":"_ERROR_Object not found!ERROR\n\nTrace begun at /disks/p3/deployment/lib/Bio/P3/Workspace/WorkspaceImpl.pm line 137\nBio::P3::Workspace::WorkspaceImpl::_error('Bio::P3::Workspace::WorkspaceImpl=HASH(0x3e1d1c8)', 'Object not found!') called at /disks/p3/deployment/lib/Bio/P3/Workspace/WorkspaceImpl.pm line 207\nBio::P3::Workspace::WorkspaceImpl::_get_db_object('Bio::P3::Workspace::WorkspaceImpl=HASH(0x3e1d1c8)', 'HASH(0x4b3eef8)', 1) called at /disks/p3/d ...

Note it doesn't say which object is not found as far as I can tell--an issue I've seen with the workspace service before.

If I run with 'copy_genome' false, I don't get an error. However, it seems to copy the genome regardless of what I put for 'copy_genome'

- The workspace 'list' call says the genome size is 0 bytes, although it's really 25.7 MB. - The spec doesn't actually say what the default for the 'copy_genome' option is https://github.com/ModelSEED/ProbModelSEED/blob/master/ProbModelSEED.spec#L373 - "511145.12_model" was tacked on to the name I gave it. Personally I'd prefer if this didn't happen, but if it's really needed, we should at least mention this in the spec. [Being picky at this point]

reconstruct/gapfill/fba response

Per email discussion, if running reconstruct/gapfill/fba returned the same object as in the list methods, the UI could be updated on response without making additional requests for all of the user's data.