jervenbolleman / faldo-paper Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
"Figure 5: OWL2 property chain axiom to infer that all positions described in a INSDC record are positioned relative to the main sequence of the record."
I'm suspicious of this - I would have to see the complete ontology with all axioms plus preferably some examples.
Presumably beginOf is the inverse of begin?
Note that property chains are unidirectional. You can infer some relationships given some chain of relationships. But given a relationship, you can't infer that there must be some specified chain of relationships. So you can infer INSDC:reference given the chain, but not the other way round.
I'm not sure the chain beginOf o endOf can ever be satisfied coherently.
e.g.
FALDO: a semantic standard describing the location of nucleotide and protein feature annotation.
Currently we mention GVF once in the BioInterchange text (without definition in place, or in the abbreviations).
We could expand this, perhaps cite Reese et al. (2010) http://dx.doi.org/10.1186/gb-2010-11-8-r88 and include GVF in the abbreviations?
In the short term I have simply removed the mention of GVF and just used GTF and GFF3 as example formats in the BioInterchange description.
"In a process called transplicing exons of one gene can be found on multiple chromosomes"
This is a sufficient but not a necessary condition. Trans-spliced genes can have exons from the same chromosome - either from a distant site (as is common in C elegans), or from the same region
My favorite is mod(mdg4):
Mongelard F, Labrador M, Baxter EM, Gerasimova TI, Corces VG: Trans-splicing as a novel mechanism to explain interallelic complementation in Drosophila.
Genetics 2002, 160:1481-1487
You can get the GFF3 from FlyBase:
http://flybase.org/reports/FBgn0002781.html
(but it may need "re-stitching, as the translation to GFF3 may lose some information)
The caption needs to be expanded: The left half of the figure is the classes, and the indentation and down arrows presumably indicate subclasses [I suspect a tree like presentation make this clearer, or at least increasing the indention?]. The right side is the properties, but what is the meaning of the blue icon (rectangle with white on the left end) versus the green icon (plain rectangle)?
Also more visual separation between the classes (left) and properties (right) might help avoid any confusion with any apparent mapping between class owl:Nothing and the horizontally aligned property after (etc).
We need from each sub-group is a contribution of 2-3 paragraphs describing your group's hackathon successes and ongoing activities. Also (to save Mark look-up time) please list all authors from your sub-group in that document.
Currently FALDO uses the CC-BY 3.0 licence, should it switch to the newer CC-BY 4.0 licence?
The predicate faldo:position with lower case can be confusing with the uppercase class faldo:Position. Should we change one of the labels? or should we point to the convention as used in e.g. DCAT.
In review, the use of the word sequence led to confusion. As talking about the real molecules in nature instead of the thing imported into EMBL.
I think the bacterial folks would be most happy if you address circular genomes, even if it is just to say that it's currently underspecified or not supported, but possible in the future.
We currently only mention the JBrowse example in passing. One idea would be an additional figure showing a JBrowse screenshot, perhaps displaying one of the real examples we already discuss, or a multi-dataset federated query?
Could you please also check out the work of Mauno Vihinen to compare and comment? http://t.co/K4RXGFnPXj
Should we be talking about the ENA, or EMBL Nucleotide Sequence Database (EMBL-Bank), or both?
http://www.ebi.ac.uk/ena/about/formats
"Data tiers within ENA provide a level of abstraction from the underlying infrastructure that has resulted from the integration of three databases: the EMBL Nucleotide Sequence Database (EMBL-Bank), the Trace Archive and the Sequence Read Archive (SRA)."
Currently we mostly use ENA, but there are still references to EMBL (not currently in the abbreviations table). Probably in terms of annotation, we're mainly concerned with EMBL-Bank (as part of the triple mirror under the INSDC with NCBI/GenBank and DDBJ).
This is a tricky issue and might require a uri change of the ontology :(
http://biohackathon.org/resources/faldo is 302 redirected to http://www.biohackathon.org/resource/faldo then is redirected to http://78462f86-a-7141bcef-s-sites.googlegroups.com/a/biohackathon.org/www/resource/faldo and is finally redirected to https://78462f86-a-7141bcef-s-sites.googlegroups.com/a/biohackathon.org/www/resource/faldo by Google Sites. It might be inconvenient for some applications.
is there a way we could set up content-type negotiation and auto conversion between formats for faldo?
I think it's a good idea to have a release process with automatic validation, junit suites and publishing of the ontology in different forms.
As the biohackathon.org web site is run by Google Sites, I don't know how much control we can have over it...
One possible solution would be to host those resources (including ontology files) on the other server by assigning new subdomain (e.g., purl.biohackathon.org), however it requires a change of the ontology URI.
Alternatively, we may keep the current way but put the versions of FALDO files also on BioPortal.
In the main text,
As part of the Integrated Database Project (http://lifesciencedb.mext.go.jp/en/)
and the Core Technology Development Program (http://biosciencedbc.jp/en/tec-dev-prog/programs)
to integrate life science databases in Japan, the National Bioscience Database
Center (NBDC) and the Database Center for Life Science (DBCLS) have hosted
an annual “BioHackathon” series of meetings bringing together biological
database teams, open source programmers, and domain experts in Semantic
Web and Linked Data [6,7].
Given the text covers the entire series, including the citations for the 1st and 2nd meeting too makes sense to me [Katayama et al 2010, 2011].
http://dx.doi.org/10.1186/2041-1480-1-8
http://dx.doi.org/10.1186/2041-1480-2-4
Figure 1 is quite visually appealing, but I think could do with some improvement.
First of all, this should probably be figure 2. Figure 1 should orient the reader and give them some kind of overview. Perhaps the SubClass hierarchy of FALDO. This figure is already getting down in the weeds with some quite specific details.
In fact it may be better to precede this with a figure comparing a chunk of GFF with a FALDO instance graph, giving a "bigger picture" view.
Comments on the figure as it stands:
See also #14
We have a strong evidence of its power in that FALDO can handle all of the annotations in INSDC/DDBJ and UniProt, but biological systems have a habit of throwing up more strange cases. However, I feel that the current wording in the abstract and conclusion is too strong, "expressive enough to describe all known biological use cases accurately" and "power to describe all biological feature positions". As a reviewer I would ask for this language to be toned down.
Figure 2: OWL2 property chain axiom - this refers to faldo:endOf. In the current version of faldo, there are no inverses declared. These should be added to faldo, or the document should substitute the named properties with OWL inverse property expressions (which starts to look ugly in RDF syntax)
We say in the text that BioPerl now includes a FALDO feature exporter - from which version onwards - and is this in the main bundle, or separate?
The current text has a number of text only "Figures", e.g. showing a partial INSDC feature table, or a fragment of a UniProt flat file, and the FALDO equivalent. It would brighten up the paper (and hopefully explain the annotation example more immediately) if these were supplemented with an actual figure.
I could probably produce some line art and/or generate figures using Biopython's GenomeDiagram for this is people thought it would be a sensible addition.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.