Giter Site home page Giter Site logo

faldo-paper's People

Contributors

cmungall avatar fstrozzi avatar jervenbolleman avatar kimjbaran avatar ktym avatar peterjc avatar rbuels avatar rjpbonnal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

faldo-paper's Issues

Figure 5: property chain

"Figure 5: OWL2 property chain axiom to infer that all positions described in a INSDC record are positioned relative to the main sequence of the record."

I'm suspicious of this - I would have to see the complete ontology with all axioms plus preferably some examples.

Presumably beginOf is the inverse of begin?

Note that property chains are unidirectional. You can infer some relationships given some chain of relationships. But given a relationship, you can't infer that there must be some specified chain of relationships. So you can infer INSDC:reference given the chain, but not the other way round.

I'm not sure the chain beginOf o endOf can ever be satisfied coherently.

Genome Variation Format (GVF)

Currently we mention GVF once in the BioInterchange text (without definition in place, or in the abbreviations).

We could expand this, perhaps cite Reese et al. (2010) http://dx.doi.org/10.1186/gb-2010-11-8-r88 and include GVF in the abbreviations?

In the short term I have simply removed the mention of GVF and just used GTF and GFF3 as example formats in the BioInterchange description.

Trans-splicing example

"In a process called transplicing exons of one gene can be found on multiple chromosomes"

This is a sufficient but not a necessary condition. Trans-spliced genes can have exons from the same chromosome - either from a distant site (as is common in C elegans), or from the same region

My favorite is mod(mdg4):

Mongelard F, Labrador M, Baxter EM, Gerasimova TI, Corces VG: Trans-splicing as a novel mechanism to explain interallelic complementation in Drosophila.
Genetics 2002, 160:1481-1487

You can get the GFF3 from FlyBase:

http://flybase.org/reports/FBgn0002781.html

(but it may need "re-stitching, as the translation to GFF3 may lose some information)

(new) Figure 1: The classes and object properties used in FALDO

The caption needs to be expanded: The left half of the figure is the classes, and the indentation and down arrows presumably indicate subclasses [I suspect a tree like presentation make this clearer, or at least increasing the indention?]. The right side is the properties, but what is the meaning of the blue icon (rectangle with white on the left end) versus the green icon (plain rectangle)?

Also more visual separation between the classes (left) and properties (right) might help avoid any confusion with any apparent mapping between class owl:Nothing and the horizontally aligned property after (etc).

Write paragraphs for main paper

We need from each sub-group is a contribution of 2-3 paragraphs describing your group's hackathon successes and ongoing activities. Also (to save Mark look-up time) please list all authors from your sub-group in that document.

Position position

The predicate faldo:position with lower case can be confusing with the uppercase class faldo:Position. Should we change one of the labels? or should we point to the convention as used in e.g. DCAT.

Address circular genomes

I think the bacterial folks would be most happy if you address circular genomes, even if it is just to say that it's currently underspecified or not supported, but possible in the future.

JBrowse screenshot as new Figure?

We currently only mention the JBrowse example in passing. One idea would be an additional figure showing a JBrowse screenshot, perhaps displaying one of the real examples we already discuss, or a multi-dataset federated query?

ENA vs EMBL-Bank

Should we be talking about the ENA, or EMBL Nucleotide Sequence Database (EMBL-Bank), or both?

http://www.ebi.ac.uk/ena/about/formats
"Data tiers within ENA provide a level of abstraction from the underlying infrastructure that has resulted from the integration of three databases: the EMBL Nucleotide Sequence Database (EMBL-Bank), the Trace Archive and the Sequence Read Archive (SRA)."

Currently we mostly use ENA, but there are still references to EMBL (not currently in the abbreviations table). Probably in terms of annotation, we're mainly concerned with EMBL-Bank (as part of the triple mirror under the INSDC with NCBI/GenBank and DDBJ).

Alternative formats/content negotation

This is a tricky issue and might require a uri change of the ontology :(

http://biohackathon.org/resources/faldo is 302 redirected to http://www.biohackathon.org/resource/faldo then is redirected to http://78462f86-a-7141bcef-s-sites.googlegroups.com/a/biohackathon.org/www/resource/faldo and is finally redirected to https://78462f86-a-7141bcef-s-sites.googlegroups.com/a/biohackathon.org/www/resource/faldo by Google Sites. It might be inconvenient for some applications.

is there a way we could set up content-type negotiation and auto conversion between formats for faldo?

I think it's a good idea to have a release process with automatic validation, junit suites and publishing of the ontology in different forms.

As the biohackathon.org web site is run by Google Sites, I don't know how much control we can have over it...

One possible solution would be to host those resources (including ontology files) on the other server by assigning new subdomain (e.g., purl.biohackathon.org), however it requires a change of the ontology URI.
Alternatively, we may keep the current way but put the versions of FALDO files also on BioPortal.

Cite 1st and 2nd BioHackathon papers too?

In the main text,

As part of the Integrated Database Project (http://lifesciencedb.mext.go.jp/en/)
and the Core Technology Development Program (http://biosciencedbc.jp/en/tec-dev-prog/programs)
to integrate life science databases in Japan, the National Bioscience Database
Center (NBDC) and the Database Center for Life Science (DBCLS) have hosted
an annual “BioHackathon” series of meetings bringing together biological
database teams, open source programmers, and domain experts in Semantic
Web and Linked Data [6,7].

Given the text covers the entire series, including the citations for the 1st and 2nd meeting too makes sense to me [Katayama et al 2010, 2011].
http://dx.doi.org/10.1186/2041-1480-1-8
http://dx.doi.org/10.1186/2041-1480-2-4

Figure 1 changes

Figure 1 is quite visually appealing, but I think could do with some improvement.

First of all, this should probably be figure 2. Figure 1 should orient the reader and give them some kind of overview. Perhaps the SubClass hierarchy of FALDO. This figure is already getting down in the weeds with some quite specific details.

In fact it may be better to precede this with a figure comparing a chunk of GFF with a FALDO instance graph, giving a "bigger picture" view.

Comments on the figure as it stands:

  • The blank node notation ("_:foo") is quite geeky and probably needs explained.
  • Should chr1 be a blank node?
  • The diagram uses the same convention (circles) for resources and literals. Maybe it's just me but I found this confusing
  • The usual W3 convention of circles=classes boxes=individuals is reversed
  • 1(a) and 1(b) should be reversed. Introduce the problem ("how do we represent this thing you're familiar with") and then show the solution ("here is how it is represented in FALDO")
  • The notation "a" should be defined (= rdf:type)
  • What is the type of fr and rr?
  • I would prefer a more concrete example. What kind of feature is this? What's its ID?
  • It's not clear from the diagram but for this to be useful :fr and :rr should connect to some feature of interest

See also #14

Claiming to handle all biological uses cases

We have a strong evidence of its power in that FALDO can handle all of the annotations in INSDC/DDBJ and UniProt, but biological systems have a habit of throwing up more strange cases. However, I feel that the current wording in the abstract and conclusion is too strong, "expressive enough to describe all known biological use cases accurately" and "power to describe all biological feature positions". As a reviewer I would ask for this language to be toned down.

inverse properties

Figure 2: OWL2 property chain axiom - this refers to faldo:endOf. In the current version of faldo, there are no inverses declared. These should be added to faldo, or the document should substitute the named properties with OWL inverse property expressions (which starts to look ugly in RDF syntax)

Details for BioPerl's FALDO exporter

We say in the text that BioPerl now includes a FALDO feature exporter - from which version onwards - and is this in the main bundle, or separate?

Visual diagrams to supplement the (text only) examples?

The current text has a number of text only "Figures", e.g. showing a partial INSDC feature table, or a fragment of a UniProt flat file, and the FALDO equivalent. It would brighten up the paper (and hopefully explain the annotation example more immediately) if these were supplemented with an actual figure.

I could probably produce some line art and/or generate figures using Biopython's GenomeDiagram for this is people thought it would be a sensible addition.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.