pybel / pybel Goto Github PK
View Code? Open in Web Editor NEW๐ถ๏ธ An ecosystem in Python for working with the Biological Expression Language (BEL)
Home Page: http://pybel.readthedocs.io
License: MIT License
๐ถ๏ธ An ecosystem in Python for working with the Biological Expression Language (BEL)
Home Page: http://pybel.readthedocs.io
License: MIT License
I already added a RST page with automodule, so add the docs in the right format inside the classes, their__init__
methods, and the modules that can get propogated through. You can also add any design notes to manager.rst that you want
In pybel.manager.defaults
, add second list called default_annotations
with the different iterations of Selventa's default annotations
The data model for namespaces and annotations should be the same, but there are cases when an ontology and a namespace are both using the same name (reference BRCO in the AD model) so lets keep their data seperate for now, even though the logic might be the same.
Locations should be translated to attributes of the relation between two object, the same way transformations and activities are.
Problem: Character sets in BEL could ends up in non valid gmml output
pybel convert --path /pathTo/PD.bel --graphml /pathTo/PD.graphml
Make a custom error handler for tloc() statements that come in the form tloc(<abundance>)
and cannot be handled
After loading a BEL document into a graph, output it again as BEL statements. Serialize:
These statements should be canonical BEL 2.0, organized by:
In pybel.manager.defaults
, include a much more thorough list of namespaces using all 3 Selventa releases, and consider using some of the higher-end BELIEF namespaces (consult the AD and PD AETIONOMY BEL models for what they use)
Use these:
default_namespaces = [
# 1.0 Release
'http://resources.openbel.org/belframework/1.0/namespace/affy-hg-u133-plus2.belns',
'http://resources.openbel.org/belframework/1.0/namespace/affy-hg-u133ab.belns',
'http://resources.openbel.org/belframework/1.0/namespace/affy-hg-u95av2.belns',
'http://resources.openbel.org/belframework/1.0/namespace/affy-mg-u74abc.belns',
'http://resources.openbel.org/belframework/1.0/namespace/affy-moe430ab.belns',
'http://resources.openbel.org/belframework/1.0/namespace/affy-mouse430-2.belns',
'http://resources.openbel.org/belframework/1.0/namespace/affy-mouse430a-2.belns',
'http://resources.openbel.org/belframework/1.0/namespace/affy-rae230ab-2.belns',
'http://resources.openbel.org/belframework/1.0/namespace/affy-rat230-2.belns',
'http://resources.openbel.org/belframework/1.0/namespace/chebi-ids.belns',
'http://resources.openbel.org/belframework/1.0/namespace/chebi-names.belns',
'http://resources.openbel.org/belframework/1.0/namespace/entrez-gene-ids-hmr.belns',
'http://resources.openbel.org/belframework/1.0/namespace/go-biological-processes-accession-numbers.belns',
'http://resources.openbel.org/belframework/1.0/namespace/go-biological-processes-accession-numbers.belns',
'http://resources.openbel.org/belframework/1.0/namespace/go-biological-processes-names.belns',
'http://resources.openbel.org/belframework/1.0/namespace/go-cellular-component-accession-numbers.belns',
'http://resources.openbel.org/belframework/1.0/namespace/go-cellular-component-accession-numbers.belns.s',
'http://resources.openbel.org/belframework/1.0/namespace/go-cellular-component-terms.belns',
'http://resources.openbel.org/belframework/1.0/namespace/hgnc-approved-symbols.belns',
'http://resources.openbel.org/belframework/1.0/namespace/mesh-biological-processes.belns',
'http://resources.openbel.org/belframework/1.0/namespace/mesh-cellular-locations.belns',
'http://resources.openbel.org/belframework/1.0/namespace/mesh-diseases.belns',
'http://resources.openbel.org/belframework/1.0/namespace/mgi-approved-symbols.belns',
'http://resources.openbel.org/belframework/1.0/namespace/rgd-approved-symbols.belns',
'http://resources.openbel.org/belframework/1.0/namespace/selventa-legacy-chemical-names.belns',
'http://resources.openbel.org/belframework/1.0/namespace/selventa-legacy-diseases.belns',
'http://resources.openbel.org/belframework/1.0/namespace/selventa-named-human-complexes.belns',
'http://resources.openbel.org/belframework/1.0/namespace/selventa-named-human-protein-families.belns',
'http://resources.openbel.org/belframework/1.0/namespace/selventa-named-mouse-complexes.belns',
'http://resources.openbel.org/belframework/1.0/namespace/selventa-named-mouse-protein-families.belns',
'http://resources.openbel.org/belframework/1.0/namespace/selventa-named-rat-complexes.belns',
'http://resources.openbel.org/belframework/1.0/namespace/selventa-named-rat-protein-families.belns',
'http://resources.openbel.org/belframework/1.0/namespace/swissprot-accession-numbers.belns',
'http://resources.openbel.org/belframework/1.0/namespace/swissprot-entry-names.belns',
# 2013 Release
'http://resources.openbel.org/belframework/20131211/namespace/affy-probeset-ids.belns',
'http://resources.openbel.org/belframework/20131211/namespace/chebi-ids.belns',
'http://resources.openbel.org/belframework/20131211/namespace/chebi.belns',
'http://resources.openbel.org/belframework/20131211/namespace/disease-ontology-ids.belns',
'http://resources.openbel.org/belframework/20131211/namespace/disease-ontology.belns',
'http://resources.openbel.org/belframework/20131211/namespace/entrez-gene-ids.belns',
'http://resources.openbel.org/belframework/20131211/namespace/go-biological-process-ids.belns',
'http://resources.openbel.org/belframework/20131211/namespace/go-biological-process.belns',
'http://resources.openbel.org/belframework/20131211/namespace/go-cellular-component-ids.belns',
'http://resources.openbel.org/belframework/20131211/namespace/go-cellular-component.belns',
'http://resources.openbel.org/belframework/20131211/namespace/hgnc-human-genes.belns',
'http://resources.openbel.org/belframework/20131211/namespace/mesh-cellular-structures.belns',
'http://resources.openbel.org/belframework/20131211/namespace/mesh-diseases.belns',
'http://resources.openbel.org/belframework/20131211/namespace/mesh-processes.belns',
'http://resources.openbel.org/belframework/20131211/namespace/mgi-mouse-genes.belns',
'http://resources.openbel.org/belframework/20131211/namespace/rgd-rat-genes.belns',
'http://resources.openbel.org/belframework/20131211/namespace/selventa-legacy-chemicals.belns',
'http://resources.openbel.org/belframework/20131211/namespace/selventa-legacy-diseases.belns',
'http://resources.openbel.org/belframework/20131211/namespace/selventa-named-complexes.belns',
'http://resources.openbel.org/belframework/20131211/namespace/selventa-protein-families.belns',
'http://resources.openbel.org/belframework/20131211/namespace/swissprot-ids.belns',
'http://resources.openbel.org/belframework/20131211/namespace/swissprot.belns',
# 2015 Release
'http://resource.belframework.org/belframework/20150611/namespace/affy-probeset-ids.belns',
'http://resource.belframework.org/belframework/20150611/namespace/chebi-ids.belns',
'http://resource.belframework.org/belframework/20150611/namespace/chebi.belns',
'http://resource.belframework.org/belframework/20150611/namespace/disease-ontology-ids.belns',
'http://resource.belframework.org/belframework/20150611/namespace/disease-ontology.belns',
'http://resource.belframework.org/belframework/20150611/namespace/entrez-gene-ids.belns',
'http://resource.belframework.org/belframework/20150611/namespace/go-biological-process-ids.belns',
'http://resource.belframework.org/belframework/20150611/namespace/go-biological-process.belns',
'http://resource.belframework.org/belframework/20150611/namespace/go-cellular-component-ids.belns',
'http://resource.belframework.org/belframework/20150611/namespace/go-cellular-component.belns',
'http://resource.belframework.org/belframework/20150611/namespace/hgnc-human-genes.belns',
'http://resource.belframework.org/belframework/20150611/namespace/mesh-cellular-structures-ids.belns',
'http://resource.belframework.org/belframework/20150611/namespace/mesh-cellular-structures.belns',
'http://resource.belframework.org/belframework/20150611/namespace/mesh-chemicals-ids.belns',
'http://resource.belframework.org/belframework/20150611/namespace/mesh-chemicals.belns',
'http://resource.belframework.org/belframework/20150611/namespace/mesh-diseases-ids.belns',
'http://resource.belframework.org/belframework/20150611/namespace/mesh-diseases.belns',
'http://resource.belframework.org/belframework/20150611/namespace/mesh-processes-ids.belns',
'http://resource.belframework.org/belframework/20150611/namespace/mesh-processes.belns',
'http://resource.belframework.org/belframework/20150611/namespace/mgi-mouse-genes.belns',
'http://resource.belframework.org/belframework/20150611/namespace/rgd-rat-genes.belns',
'http://resource.belframework.org/belframework/20150611/namespace/selventa-legacy-chemicals.belns',
'http://resource.belframework.org/belframework/20150611/namespace/selventa-legacy-diseases.belns',
'http://resource.belframework.org/belframework/20150611/namespace/selventa-named-complexes.belns',
'http://resource.belframework.org/belframework/20150611/namespace/selventa-protein-families.belns',
'http://resource.belframework.org/belframework/20150611/namespace/swissprot-ids.belns',
'http://resource.belframework.org/belframework/20150611/namespace/swissprot.belns'
]
default_annotations = [
# 1.0 Release
'http://resource.belframework.org/belframework/1.0/annotation/atcc-cell-line.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-body-region.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-cardiovascular-system.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-cell-structure.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-cell.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-digestive-system.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-disease.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-embryonic-structure.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-endocrine-system.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-fluid-and-secretion.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-hemic-and-immune-system.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-integumentary-system.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-musculoskeletal-system.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-nervous-system.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-respiratory-system.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-sense-organ.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-stomatognathic-system.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-tissue.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/mesh-urogenital-system.belanno',
'http://resource.belframework.org/belframework/1.0/annotation/species-taxonomy-id.belanno',
# 2013 Release
'http://resource.belframework.org/belframework/20131211/annotation/anatomy.belanno',
'http://resource.belframework.org/belframework/20131211/annotation/cell-line.belanno',
'http://resource.belframework.org/belframework/20131211/annotation/cell-structure.belanno',
'http://resource.belframework.org/belframework/20131211/annotation/cell.belanno',
'http://resource.belframework.org/belframework/20131211/annotation/disease.belanno',
'http://resource.belframework.org/belframework/20131211/annotation/mesh-anatomy.belanno',
'http://resource.belframework.org/belframework/20131211/annotation/mesh-diseases.belanno',
'http://resource.belframework.org/belframework/20131211/annotation/species-taxonomy-id.belanno',
# 2015 Release
'http://resource.belframework.org/belframework/20150611/annotation/anatomy.belanno',
'http://resource.belframework.org/belframework/20150611/annotation/cell-line.belanno',
'http://resource.belframework.org/belframework/20150611/annotation/cell-structure.belanno',
'http://resource.belframework.org/belframework/20150611/annotation/cell.belanno',
'http://resource.belframework.org/belframework/20150611/annotation/disease.belanno',
'http://resource.belframework.org/belframework/20150611/annotation/mesh-anatomy.belanno',
'http://resource.belframework.org/belframework/20150611/annotation/mesh-diseases.belanno',
'http://resource.belframework.org/belframework/20150611/annotation/species-taxonomy-id.belanno'
]
Open Source License
Contributor License Agreement
Further Reading
Use MatchFirst
in all lists, and streamline the individual items upon building like self.protein
and eventually self.bel_term
within pybel.parsers.bel_parser.py
Finally, do memory checks, then hopefully make it possible for Travis to run the whole test suite
I want to add to the CLI something like pybel nscache list
to list all of the urls that have been stored in the cache. Can you make a function DefinitionCacheManager.ls(stream=sys.stdout)
that prints this info to the given stream (default to stdout)
Create Jupyter Notebook showcasing a nice use case.
Todo:
Options:
Add support and rewriting for BEL 2.0
https://github.com/OpenBEL/language/blob/master/version_2.0/MIGRATE_BEL1_BEL2.md#variations
Should we keep all namespaces or are some dispensable?
Parse BEL statements in the form of subject_1 predicate_1 (subject_2 predicate_2 object)
and loading into graph.
Example:
p(HGNC:AKT1) -> (p(HGNC:AKT2) -> p(HGNC:PIK3CA))
To Do:
predicate_1
and predicate_2
When adding a relation, all annotations that contain lists should be cartesian product'd and multiple edges with each context added to the graph
This database should be importable to python as a dictionary of {namespace: {name: (canonical_namesapce, canonical_name)}}
To Do:
sqlalchemy
modelsUltimately, the API should look like:
>>> import sqlite3
>>> import pybel
>>> conn = sqlite3.connect(":memory:")
>>> mapping = pybel.nsdb.load_mapping_db(conn)
Or, when using sqlalchemy
:
>>> from sqlalchemy import create_engine
>>> eng = create_engine("sqlite://")
>>> mapping = pybel.nsdb.load_mapping_db(eng)
Not for post translational modifications though
py2neo
graphs?cli.py
Consider the AETIONOMY API for neo2django, but not too hard. This should be sensical in its own right, and neo2django might be overdue for changes as well.
Make all log statments use the %
operator, because it's better. Example: log.info('Log this: %s', this)
Make a Jupyter Notebook describing where to get named protein complex information from the BEL Framework. Parse this information and use it to post-process a graph. Later, this code can be written as a post-processing tool in either pybel-core
or pybel-tools
Same issue as pybel/pybel-notebooks#3
Later:
Define BELGraph.__getitem__
or BELGraph.query_edge.__getitem__
so queries like this can be made:
>>> import pybel
>>> g = pybel.from_url('http://resource.belframework.org/belframework/20150611/knowledge/small_corpus.bel')
>>> g.query_edge[g.relation == 'decreases']
...
>>> g.query_edge[g.relation == 'decreases' or g.relation == 'directlyDecreases']
...
>>> g.query_node[type='Protein']
...
Hopefully we can recycle the code from AETIONOMY
. This stuff can all go into the BELGraph
subclass of nx.MultiDiGraph
Built in to main BELGraph and CLI
Exit Codes:
While it's not mentioned in the BEL 2.0 Specification, the BEL 1.0 Specification item 5.1.3 states that only the following metadata may be annotated:
After loading a BEL document into a graph, output it again as BEL statements. These statements should be canonical BEL 2.0, organized by:
Hello guys,
I am trying to add my thesis to the pybel and I found that there is only one annotation save to the edges despite there are more than one.
Could you please check that out?
Thanks,
Daniel
This issue used to be called 'Export to RDF', but I think we should assemble our thoughts on why RDF is a bad way to do data interchange for this kind of data, and add that to the Design Choices section of the documentation
Caches for prior knowledge about subClassOf relationships for any nodes like diseases in a hierarchy, protein families, components of protein complexes
sqlalchemy
that connects data in definitions cache with relationships like subClassOf
, partOf
, memberOf
, etc.SUBCLASS_TABLE_NAME = 'pybel_cache_subclass'
class Subclass(Base):
"""This table represents the many-to-many subclass relationships between names or annotations"""
__tablename__ = SUBCLASS_TABLE_NAME
parent = Column(Integer, ForeignKey('{}.id'.format(CONTEXT_TABLE_NAME)), primary_key=True)
child = Column(Integer, ForeignKey('{}.id'.format(CONTEXT_TABLE_NAME)), primary_key=True)
add_heirarchy(list_of_pairs)
to DefinitionCacheManager
that takes a list of pairs (parent, child)
and inserts the appropriate edges to this tablenx.DiGraph
whose edges are pairs of (parent, child)
BEL namespace files specify in which abundance functions namespaces can be used. Add a semantic validator (possibly after compilation) for this.
The OpenBEL framework provides equivalence (*.beleq
) files in the same format as the namespace (*.belns
) and annotation (*.belanno
) files. These files map each value to a hash that represents their "equivalence class"
Example: http://resources.openbel.org/belframework/20150611/equivalence/disease-ontology.beleq
pybel.manager.database_models
and I/O class pybel.manager.database_models.DefinitionsCacheManager.py
built by @LeKono for the Definitions Cache Manager as a template, build a new sqlalchemy
model to store this data.
pybel.manager.database_models.Context
and the equivalence classWe should define some conventions on how to handle stuff. i.e. naming of tables in db.
Requested by Reagon and Asif for annotator debugging purposes
Windows may not be the OS of choice for programmers but for other scientists and students it is often the OS to go. There is a problem in requests_file
with local files on a windows machine. It can not handle the windows OS.sep becaus the url decoding is implemented with this statement:
path_parts = [unquote(p) for p in url_parts.path.split('/')]
.
Maybe this is not the most urgent 'bug' to handle. But it is a problem that we should take care of.
This is the monolithic to-do for the data manager portion of the project. This consists of translating the Django
models used in the AETIONOMY Knowledgebase to use SQLlchemy
.
For the first try, do NOT focus on speed. We need a working implementation first before optimizing. This should all be done in beautiful, idiomatic python, and take full advantage of the SQLAlchemy
ORM. This also means that pandas
is unacceptable.
pybel.manager.models.Network
)pybel.manager.graph_cache.GraphCacheManager
)pybel.manager
pybel.to_database(belgraph, conn_str)
pybel.from_database(conn_str)
Reinvestigate all code:
relationship()
to automatically make joinssession.query(models.Namespace).filter(models.Namespace.keyword == 'HGNC').one()
and transaction management at the session
levelAutomatically add two-way relations with A relation B
and B relation A
for correlative relations and other mutual relationships.
Change parser, bro!
Inside the networkx.MultiDiGraph
data structure, is it reasonable to assign bizarre names to nodes based on their type/namespace/name, or is it better to just keep an ID?
Implement:
ParseElement_
to tuples using ParseElement_.asList
and pybel.parser.utils.list2tuple
and come up with a reasonable sorting method for nodes containing lists, like composite, complex, and for lists of modifiers for protein, gene, and RNA variantsCheck out the demo notebook
How can data be interchanged between BEL and other systems biology formats, like SMBL?
Check INDRA, NDEX, CausalBioNet, and other data sources to see how they're interchanging data
Get hooks for more validator plugins. For example, Reagon asked for the statements that don't have a disease state (normal, AD, etc) to get flagged.
This could either be done on compile time, or the line number for each statement could be included in the graph.
What format is best? There are many line formats for graphs.
See:
CSV and Excel are getting the backburner for now. Do: GraphML, Node Link JSON, and pickle
Check the BEL 1.0 Syntax for UNSET
at item 5.2.3. There are additional ways to unset, including:
UNSET ALL
UNSET {list, list2, ...}
really messed up with semantic checking. fix immediately
Add to pybel.parser.parse_exceptions
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.