Giter Site home page Giter Site logo

sbrg / cobradb Goto Github PK

View Code? Open in Web Editor NEW
20.0 14.0 11.0 30.97 MB

COBRAdb loads genome-scale metabolic models and genome annotations into a relational database. It already powers BiGG Models, and it is available under the MIT license.

Home Page: http://bigg.ucsd.edu

License: Other

Python 99.83% Shell 0.17%
python database sqlalchemy

cobradb's Issues

Quick speedup in loading/model_loading.py

Hi there,

I just realized the way compartments were being uploaded is quite unefficient:

def loadCompartments(self, modellist, session):
    for model in modellist:
        for component in model.metabolites:
            if component.id is not None:
                if not session.query(Compartment).filter(Compartment.name == component.id[-1:len(component.id)]).count():
                    compartmentObject = Compartment(name = component.id[-1:len(component.id)])
                    session.add(compartmentObject)

This syntax generates ~1000 transactions, which takes around 5-10s
Since the number of compartments is supposedly reduced, I believe this approach might be better:

def loadCompartments(self, modellist):
        compartments_all = set()
        for model in modellist:
            for component in model.metabolites:
                if component.id is not None:
                        compartments_all.add(component.id[-1])
            for symbol in compartments_all:
                if not self.session.query(Compartment).filter(Compartment.symbol == symbol).count():
                    compartmentObject = Compartment(symbol = symbol)
                    self.session.add(compartmentObject)

This generates only as much transactions as there are compartments in a model (usually 3-5).

Anyway, it's only a static 5-10s of speed up, but over time it adds up to a lot if you do intensive testing. And it is also less stringent on your DB, if you're the DB-happiness-caring type like I am.

Best,

Pierre

We should use a named schema

Helps enable queries across multiple PostgreSQL schemas for users that wish to integrate cobradb into other systems. This can be done with table_args = {'schema' : 'cobradb'} in the class definitions.

Pierre Salvy, greatest coder in the world

@P0n3y: I know you have an elegant and beautiful function for extracting a COBRA model from the OME database. Do you think you could submit a PR for that function? We could run it by @jlerman44 too...

Thanks! I hope you guys are doing well up there.

-Zak

Numeric data type in SQLAlchemy

Hi there,

I was wondering, is there any particular reason the schema uses the Numeric type instead of the standard Float type for float values ?
Because when I pull from the DB I get Decimal stoichiometry and cobra really does not like it...

Inconsistencies in class name resolution in model_loading.py

Hi there,

Just ran through some issue when trying to set this up, at the ome/loading/model_loading.py part that calls ome/ome/models.py: Class names are referred to with a namespace such as My_Class instead of MyClass in the latter file.

Shall I send you a pull request ?

UniqueConstraint not enforced in CompartmentalizedComponent

Hi again,

I came across an issue while uploading several times my model, at which point I discovered that the UniqueConstraint of the rows in the table CompartmentalizedComponent are not enforced, despite its declaration:

class CompartmentalizedComponent(Base):
    __tablename__='compartmentalized_component'

    id = Column(Integer, Sequence('wids'), primary_key=True)
    component_id = Column(Integer, ForeignKey('component.id'), nullable=False)
    compartment_id = Column(Integer, ForeignKey('compartment.id'), nullable=False)
    UniqueConstraint('compartment_id', 'component_id')

And I also discovered that the upload method for compartmentalized components in model_loading.py:

    def loadCompartmentalizedComponent(self, modellist, session):
        for model in modellist:
            for metabolite in model.metabolites:
                identifier = session.query(Compartment).filter(Compartment.name == metabolite.id[-1:len(metabolite.id)]).first()
                m = session.query(Metabolite).filter(Metabolite.name == metabolite.id.split("_")[0]).first()
                #m = session.query(Metabolite).filter(Metabolite.kegg_id == metabolite.notes.get("KEGGID")[0]).first()
                object = Compartmentalized_Component(component_id = m.id, compartment_id = identifier.id)
                session.add(object)

does not check for prior existence of the said compartmentalized component, unlike how it's done with the metabolites. What happens next in loadModelCompartmentalizedComponent when calling

compartmentalized_component_query = session.query(Compartmentalized_Component).filter(Compartmentalized_Component.component_id == componentquery.id).filter(Compartmentalized_Component.compartment_id == compartmentquery.id).first()

is that the component is linked to one among many identical compartmentalized components (first() call). This gets worse when you run a query like

compartmentalized_components = make_dict(session.query(CompartmentalizedComponent).\
    join(ModelCompartmentalizedComponent, CompartmentalizedComponent.id == \
        ModelCompartmentalizedComponent.compartmentalized_component_id).\
    filter(ModelCompartmentalizedComponent.model_id == modelObject.id))

to get the list of compartmentalized components used in your model, and try to fetch from it an id equal to a ReactionMatrix.compartmentalized_component_id. It does not work because this last one points to another sibling of this compartmentalized component which likely not the one in your model.

I don't know if this was very clear, but to sum up there are two points:

  • UniqueConstraint does not seem to be enforced (I have no idea as to why)
  • loadCompartmentalizedComponents does not check if the compartmentalized component already exists in the database.

A simple fix for this one would be:

def loadCompartmentalizedComponent(self, modellist, session):
    for model in modellist:
        for metabolite in model.metabolites:
            metaboliteObject = session.query(Metabolite).\
                filter(Metabolite.name == metabolite.id[:-2]).first()
            compartmentObject = session.query(Compartment).\
                filter(Compartment.symbol == metabolite.id[-1]).first()
            compartmentalized_component_found = True if session.query(CompartmentalizedComponent).\
                    filter(CompartmentalizedComponent.component_id == component_query.id).\
                    filter(CompartmentalizedComponent.compartment_id == compartment_query.id)\

            if not session.query(CompartmentalizedComponent).\
                        filter(CompartmentalizedComponent.component_id == component_query.id).\
                        filter(CompartmentalizedComponent.compartment_id == compartment_query.id):                
                object = CompartmentalizedComponent(
                    component_id = metaboliteObject.id, 
                    compartment_id = compartmentObject.id)
                session.add(object)

I attached a printscreen as an illustration.
bug_report_ome_cc

Components need to have unique ids

Right now the component table is unique on the name field. Unfortunately there is a transcription unit ade and adenine ... Kind of annoying because it will require get_or_create for every component addition.

Normalisation issue on ModelReaction

Hello SBRG!

I thought I would be done bothering you with my annoying issues, but I found out a little redundancy problem. It appears that both ModelReaction and Reaction have a column named 'name', and it looks like they are used to store the same kind of information:

in models.py:

class ModelReaction(Base):
    __tablename__='model_reaction'
    id = Column(Integer, Sequence('wids'), primary_key=True)
    reaction_id = Column(Integer, ForeignKey('reaction.id'), nullable=False)
    model_id = Column(Integer, ForeignKey('model.id'), nullable=False)
    name = Column(String)
    upperbound = Column(Numeric)
    lowerbound = Column(Numeric)
    gpr = Column(String)

in base.py:

class Reaction(Base):
    __tablename__ = 'reaction'
    id = Column(Integer, Sequence('wids'), primary_key=True)
    biggid = Column(String)
    name = Column(String)
    long_name = Column(String)
    type = Column(String(20))
    notes = Column(String)

and in loading.py, line 403:

def loadReactions(self , modellist, session):
    [...]
    reactionObject = Reaction(name = reaction.id, long_name = reaction.name, notes = '')

and lines 483-485:

def loadModelReaction(self, modellist, session):
    [...]
    object = ModelReaction(reaction_id = reactionquery.id, model_id = modelquery.id, name = reaction.id, upperbound = reaction.upper_bound, lowerbound = reaction.lower_bound, gpr = reaction.gene_reaction_rule)

This leads to some data redundancy when you join the two tables, which might be undesirable (especially if you are a normalization fanatic like me ahaha)

Anyway, hope this helps!

Case sensitive directory names in Ubtunu

Minor issue here - will send a pull request one I resolve someother linux-specific issues.

GenBank vs. genbank in loading/component_loading.py causes some issues for Ubuntu users.

@@ -317,7 +317,8 @@ def load_genome(genbank_file, base, components, debug=False):

 session = base.Session()
 try:
  •  gb_file = SeqIO.read(settings.data_directory+'/annotation/GenBank/'+genbank_file,'gb')
    
  •  print settings.data_directory+'/annotation/GenBank/'+genbank_file
    
  •  gb_file = SeqIO.read(settings.data_directory+'/annotation/genbank/'+genbank_file,'gb')
    

setup.py install_packages is broken

The version ranges are not specific enough, running setup.py on a blank conda env with just python 3.7 installed yields a dependency resolution error because a jupyter version > 6 is installed that requires tornado version > 6, but 4.5.3 is installed earlier.

Furthermore, even after fixating all requirements accordingly, there are requirements missing, in particular cython.

More generally, this project would benefit greatly from a pip freeze requirements.txt file which is suitable for building.

Since BiGG no longer provides SQL dumps, building cobradb has become prerequisite for acquiring a dump, and it is unfortunate that this is not easily accomplished.

SQLAlchemy 0.9.8 will throw warnings for use of 'type' column in multiple inheritance

Newer versions of SQLAlchemy are throwing a warning for the way in which the multiple inheritance (e.g. TU inherits from RNA which inherits from Component) works. I believe this is just a configuration issue and does not appear to be affecting actual data loading or queries. Will revisit when time permits but if anyone has issues on newer versions of SQLAlchemy this may need to be addressed sooner.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.