sbrg / cobradb Goto Github PK

COBRAdb loads genome-scale metabolic models and genome annotations into a relational database. It already powers BiGG Models, and it is available under the MIT license.

Home Page: http://bigg.ucsd.edu

License: Other

Python 99.83% Shell 0.17%

python database sqlalchemy

cobradb's Issues

Quick speedup in loading/model_loading.py

Hi there,

I just realized the way compartments were being uploaded is quite unefficient:

def loadCompartments(self, modellist, session):
    for model in modellist:
        for component in model.metabolites:
            if component.id is not None:
                if not session.query(Compartment).filter(Compartment.name == component.id[-1:len(component.id)]).count():
                    compartmentObject = Compartment(name = component.id[-1:len(component.id)])
                    session.add(compartmentObject)

This syntax generates ~1000 transactions, which takes around 5-10s
Since the number of compartments is supposedly reduced, I believe this approach might be better:

def loadCompartments(self, modellist):
        compartments_all = set()
        for model in modellist:
            for component in model.metabolites:
                if component.id is not None:
                        compartments_all.add(component.id[-1])
            for symbol in compartments_all:
                if not self.session.query(Compartment).filter(Compartment.symbol == symbol).count():
                    compartmentObject = Compartment(symbol = symbol)
                    self.session.add(compartmentObject)

This generates only as much transactions as there are compartments in a model (usually 3-5).

Anyway, it's only a static 5-10s of speed up, but over time it adds up to a lot if you do intensive testing. And it is also less stringent on your DB, if you're the DB-happiness-caring type like I am.

Best,

Pierre

We should use a named schema

Helps enable queries across multiple PostgreSQL schemas for users that wish to integrate cobradb into other systems. This can be done with table_args = {'schema' : 'cobradb'} in the class definitions.

Pierre Salvy, greatest coder in the world

@P0n3y: I know you have an elegant and beautiful function for extracting a COBRA model from the OME database. Do you think you could submit a PR for that function? We could run it by @jlerman44 too...

Thanks! I hope you guys are doing well up there.

-Zak

Numeric data type in SQLAlchemy

Hi there,

I was wondering, is there any particular reason the schema uses the Numeric type instead of the standard Float type for float values ?
Because when I pull from the DB I get Decimal stoichiometry and cobra really does not like it...

Inconsistencies in class name resolution in model_loading.py

Hi there,

Just ran through some issue when trying to set this up, at the ome/loading/model_loading.py part that calls ome/ome/models.py: Class names are referred to with a namespace such as My_Class instead of MyClass in the latter file.

Shall I send you a pull request ?

UniqueConstraint not enforced in CompartmentalizedComponent

Hi again,

I came across an issue while uploading several times my model, at which point I discovered that the UniqueConstraint of the rows in the table CompartmentalizedComponent are not enforced, despite its declaration:

class CompartmentalizedComponent(Base):
    __tablename__='compartmentalized_component'

    id = Column(Integer, Sequence('wids'), primary_key=True)
    component_id = Column(Integer, ForeignKey('component.id'), nullable=False)
    compartment_id = Column(Integer, ForeignKey('compartment.id'), nullable=False)
    UniqueConstraint('compartment_id', 'component_id')

And I also discovered that the upload method for compartmentalized components in model_loading.py:

    def loadCompartmentalizedComponent(self, modellist, session):
        for model in modellist:
            for metabolite in model.metabolites:
                identifier = session.query(Compartment).filter(Compartment.name == metabolite.id[-1:len(metabolite.id)]).first()
                m = session.query(Metabolite).filter(Metabolite.name == metabolite.id.split("_")[0]).first()
                #m = session.query(Metabolite).filter(Metabolite.kegg_id == metabolite.notes.get("KEGGID")[0]).first()
                object = Compartmentalized_Component(component_id = m.id, compartment_id = identifier.id)
                session.add(object)

does not check for prior existence of the said compartmentalized component, unlike how it's done with the metabolites. What happens next in loadModelCompartmentalizedComponent when calling

compartmentalized_component_query = session.query(Compartmentalized_Component).filter(Compartmentalized_Component.component_id == componentquery.id).filter(Compartmentalized_Component.compartment_id == compartmentquery.id).first()

is that the component is linked to one among many identical compartmentalized components (first() call). This gets worse when you run a query like

compartmentalized_components = make_dict(session.query(CompartmentalizedComponent).\
    join(ModelCompartmentalizedComponent, CompartmentalizedComponent.id == \
        ModelCompartmentalizedComponent.compartmentalized_component_id).\
    filter(ModelCompartmentalizedComponent.model_id == modelObject.id))

to get the list of compartmentalized components used in your model, and try to fetch from it an id equal to a ReactionMatrix.compartmentalized_component_id. It does not work because this last one points to another sibling of this compartmentalized component which likely not the one in your model.

I don't know if this was very clear, but to sum up there are two points:

UniqueConstraint does not seem to be enforced (I have no idea as to why)
loadCompartmentalizedComponents does not check if the compartmentalized component already exists in the database.

A simple fix for this one would be:

def loadCompartmentalizedComponent(self, modellist, session):
    for model in modellist:
        for metabolite in model.metabolites:
            metaboliteObject = session.query(Metabolite).\
                filter(Metabolite.name == metabolite.id[:-2]).first()
            compartmentObject = session.query(Compartment).\
                filter(Compartment.symbol == metabolite.id[-1]).first()
            compartmentalized_component_found = True if session.query(CompartmentalizedComponent).\
                    filter(CompartmentalizedComponent.component_id == component_query.id).\
                    filter(CompartmentalizedComponent.compartment_id == compartment_query.id)\

            if not session.query(CompartmentalizedComponent).\
                        filter(CompartmentalizedComponent.component_id == component_query.id).\
                        filter(CompartmentalizedComponent.compartment_id == compartment_query.id):                
                object = CompartmentalizedComponent(
                    component_id = metaboliteObject.id, 
                    compartment_id = compartmentObject.id)
                session.add(object)

I attached a printscreen as an illustration.

Components need to have unique ids

Right now the component table is unique on the name field. Unfortunately there is a transcription unit ade and adenine ... Kind of annoying because it will require get_or_create for every component addition.

Normalisation issue on ModelReaction

Hello SBRG!

I thought I would be done bothering you with my annoying issues, but I found out a little redundancy problem. It appears that both ModelReaction and Reaction have a column named 'name', and it looks like they are used to store the same kind of information:

in models.py:

class ModelReaction(Base):
    __tablename__='model_reaction'
    id = Column(Integer, Sequence('wids'), primary_key=True)
    reaction_id = Column(Integer, ForeignKey('reaction.id'), nullable=False)
    model_id = Column(Integer, ForeignKey('model.id'), nullable=False)
    name = Column(String)
    upperbound = Column(Numeric)
    lowerbound = Column(Numeric)
    gpr = Column(String)

in base.py:

class Reaction(Base):
    __tablename__ = 'reaction'
    id = Column(Integer, Sequence('wids'), primary_key=True)
    biggid = Column(String)
    name = Column(String)
    long_name = Column(String)
    type = Column(String(20))
    notes = Column(String)

and in loading.py, line 403:

def loadReactions(self , modellist, session):
    [...]
    reactionObject = Reaction(name = reaction.id, long_name = reaction.name, notes = '')

and lines 483-485:

def loadModelReaction(self, modellist, session):
    [...]
    object = ModelReaction(reaction_id = reactionquery.id, model_id = modelquery.id, name = reaction.id, upperbound = reaction.upper_bound, lowerbound = reaction.lower_bound, gpr = reaction.gene_reaction_rule)

This leads to some data redundancy when you join the two tables, which might be undesirable (especially if you are a normalization fanatic like me ahaha)

Anyway, hope this helps!

Case sensitive directory names in Ubtunu

Minor issue here - will send a pull request one I resolve someother linux-specific issues.

GenBank vs. genbank in loading/component_loading.py causes some issues for Ubuntu users.

@@ -317,7 +317,8 @@ def load_genome(genbank_file, base, components, debug=False):

 session = base.Session()
 try:

 gb_file = SeqIO.read(settings.data_directory+'/annotation/GenBank/'+genbank_file,'gb')

 print settings.data_directory+'/annotation/GenBank/'+genbank_file

 gb_file = SeqIO.read(settings.data_directory+'/annotation/genbank/'+genbank_file,'gb')

model dumper should add linkouts as annotation

The various objects in cobrapy all have an annotation attribute. These should get populated with the linkouts from the database when dumping the model.

All genomic locations are off by 1 in all organisms (leftpos too low)

https://ecocyc.org/gene?orgid=ECOLI&id=EG11555
[6,529 <- 7,959] (0.14 centisomes, 1°)

ME-Model code would've caught this easy ;).

setup.py install_packages is broken

The version ranges are not specific enough, running setup.py on a blank conda env with just python 3.7 installed yields a dependency resolution error because a jupyter version > 6 is installed that requires tornado version > 6, but 4.5.3 is installed earlier.

Furthermore, even after fixating all requirements accordingly, there are requirements missing, in particular cython.

More generally, this project would benefit greatly from a pip freeze requirements.txt file which is suitable for building.

Since BiGG no longer provides SQL dumps, building cobradb has become prerequisite for acquiring a dump, and it is unfortunate that this is not easily accomplished.

SQLAlchemy 0.9.8 will throw warnings for use of 'type' column in multiple inheritance

Newer versions of SQLAlchemy are throwing a warning for the way in which the multiple inheritance (e.g. TU inherits from RNA which inherits from Component) works. I believe this is just a configuration issue and does not appear to be affecting actual data loading or queries. Will revisit when time permits but if anyone has issues on newer versions of SQLAlchemy this may need to be addressed sooner.

metabolite_duplicates specified twice in settings.ini.example

See

cobradb/settings.ini.example

Line 41 in 4bc9153

metabolite_duplicates = ~/path/to/ome_data/metabolite-duplicates.txt

and

cobradb/settings.ini.example

Line 51 in 4bc9153

metabolite_duplicates = ~/path/to/ome_data/metabolite-duplicates.txt

sbrg / cobradb Goto Github PK

cobradb's Issues

Quick speedup in loading/model_loading.py

We should use a named schema

Pierre Salvy, greatest coder in the world

Numeric data type in SQLAlchemy

Inconsistencies in class name resolution in model_loading.py

UniqueConstraint not enforced in CompartmentalizedComponent

Components need to have unique ids

Normalisation issue on ModelReaction

Case sensitive directory names in Ubtunu

model dumper should add linkouts as annotation

All genomic locations are off by 1 in all organisms (leftpos too low)

setup.py install_packages is broken

SQLAlchemy 0.9.8 will throw warnings for use of 'type' column in multiple inheritance

metabolite_duplicates specified twice in settings.ini.example

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent