Giter Site home page Giter Site logo

theelnfileformat's Introduction

ELN file format (.eln)

Description

The ELN file format is an archive format for exchange of experimental results and data.

This file format can be created and read by software such as Electronic Laboratory Notebooks.

Specification

The file format is specified in the SPECIFICATION.md file. It follows the RO-Crate specification and bundles the files into a ZIP archive.

The .eln file/extension is an accepted and recognized media type (previously known as MIME type), see IANA assignment: https://www.iana.org/assignments/media-types/application/vnd.eln+zip

Origin

This file format was created for the needs of better interoperability between ELN software. So editors of these software decided it would be useful for users to have an archive format that can be easily understood by several ELNs.

Status

Generally working with some quirks here and there.

Known implementations

Implementation .eln import .eln export Example
eLabFTW elabftw
Kadi4Mat kadi4mat
Pasta PASTA
SampleDB SampleDB
Rspace RSpace

theelnfileformat's People

Contributors

florianrhiem avatar jmanideep avatar nhanlon2 avatar nicolascarpi avatar stain avatar steffenbrinckmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

theelnfileformat's Issues

Trailing slashes in context URLs

After having serious trouble exporting correct JSON-LD, I think the RO-Crate spec has an error: The @context URIs should end in a “/”, shouldn’t they? For instance:

{                                                                                                                                             
  "@context": {                                                                                                                               
    "@vocab": "https://w3id.org/ro/crate/1.1/context/",                                                                                       
    "sm": "http://scimesh.org/SciMesh/"                                                                                                       
  },                                                                                                                                          
  "@graph": [                                                                                                                                 
    {

List ELN implementations

Hi, would you be able to list/link implementations of TheELNFileFormat on the README?

The challenge is that "ELN" is an overloaded terms so it's a bit tricky to check in docs or google..

So far I think the implementations are: eLabFTW, RSpace, PASTA-ELN (documentation link?)

https://github.com/TheELNConsortium/#participating-eln-projects has a bit more but I guess not all have implemented ELN format yet.

Then I can update https://github.com/ResearchObject/ro-crate/blob/master/docs/profiles.md#electronic-lab-notebook-eln or make a separate "in-use" page from the RO-Crate web site to help promote further.

Citations like https://doi.org/10.1186/s13326-021-00257-x would also be useful to capture!

Question concerning directory structure

I read the specification like it says that the eln file and the content of file should have the same name

Inside a .eln file, there MUST be a folder that will contain the rest of the data. The name of the folder SHOULD be the same as the archive name. This folder at root prevents issues when opening the file as a zip file and getting archived files extracted in the current directory, possibly overwriting other files, and probably polluting the current directory. […]

<root>
  some-data.eln/
    - ro-crate-metadata.json
    - experimentA/:[…]

However, to my knowledge, that would not work as unzipping the eln file would overwrite the currently unzipped file by a directory of the same name.
The examples are hence also not confirming to my interpretation of the text, but either have a completely different name (ok, because of should and not must) or the eln/zip file contains a directory without the .eln extension in its name (collections-example.eln -> collections-example/).

I would be pleased if either a formulation could be found that cannot be interpreted like I did or if the examples provided in this git repository would show the described behaviour and a description on how to generate and extract such zip files that are named exactly as their content.

Sensolytics example file

Hi,
I tried to add a first example for our measurement data, but I was not allowed to upload a branch to the code to create a pull request. Might not be necessary anyway, as the reason was mainly to see if the format can be read by the participating systems. I add the readme and eln file here, let me know if we can/should change anything...
sensolytics.zip

Flexible metadata in .eln files

Currently, the .eln format does not have a unified way of exporting flexible metadata. Instead, most ELNs export a data structure specific to that ELN in JSON format which contains various information about a dataset, including some flexible metadata. As they are (mostly) representations of internal models, they vary quite a bit.

Motivation

While it is already useful to be able to reference samples, measurements and other objects from other ELNs with some generic metadata such as the creation and modification times and the author, it would be even better if we could exchange flexible metadata about these. In the last meeting, we briefly discussed the goal of a "gold standard" experiment that can be represented as an .eln file, imported and exported by the various ELNs. For this, we should be able to exchange information such as instrument or process parameters. We cannot expect to strictly define these, instead they should map a (textual) identifier to data of some type.

Ideas / Suggestions

For mapping identifiers to values, the PropertyValue should be useful, as it can map its propertyID to its value, which can be a boolean, text, a number or the generic StructuredValue type, and also supports units and a human-readable version as a fallback. So, if we had to store a temperature with the identifier target_temperature, it could be represented as:

{
  "@type": "PropertyValue"
  "propertyId": "target_temperature",
  "value": 293.15,
  "unitCode": "KEL",
  "unitText": "K",
  "description": "293.15 K"
}

while a boolean instrument setting could be represented like this:

{
  "@type": "PropertyValue"
  "propertyId": "vacuum_enabled",
  "value": false
}

If we would provide an array of such property values, we could support a flat mapping of identifiers to values. Such an array is part of the Dataset type we use for datasets in the ro-crate-metadata.json in the property variableMeasured, however this use case goes beyond the "variables that are measured in some dataset". So we could either use that property and "stretch" its definition by a fair bit, use another existing property, or branch off there and define a custom property.

As PropertyValue objects can contain a value of StructuredValue type, of which PropertyValue is a sub-type off, it might also be possible to implement nested data structures like this. Alternatively, the structure could be represented in the identifier.


What are your thoughts on storing flexible metadata in PropertyValue objects? Which solution for attaching these to the datasets do you prefer? How should we deal with nested metadata? Would you prefer to store flexible metadata outside the ro-crate-metadata.json entirely or in a custom format instead?

Interoperability

Hi,

When testing interoperability between Kadi4Mat, eLabFTW and RSpace (those I have access to currently), I found that in only one instance is expected behaviour given, when exporting from Kadi4Mat and importing into eLabFTW: see the table below. I'm using the latest version of eLabFTW, the hosted demo instance of Kadi4Mat and RSpace Community. This is a continuation from this post, where I've contained to use MWEs containing tagged single entries with text and a tag in each case.

Import (below) / Export (right) Kadi4Mat eLabFTW RSpace
Kadi4Mat n/a
eLabFTW n/a
RSpace n/a

Signatures for .eln files

Currently, there is no mechanism to ensure that an .eln file was actually created by a specific ELN, which poses issues related to data provenance and in how far the information inside the .eln can be trusted.

Motivation

While ELNs generally assume that the information from users can be trusted, there are cases where it may be necessary to know that the information was entered at a specific time by a specific person, and where proof is needed for this instead of relying on the users to be trustworthy. To achieve this, many ELNs include mechanisms such as timestamping, versioning or signing of entered information.

For those ELNs, importing an .eln file must not circumvent these mechanisms, as that would render them useless, and as such all information has to be marked as coming from an import from user X at date/time Y. A chain of trust for information from .eln files could potentially improve the situation there, as another ELN might be more trust-worthy than users in these cases. It would still be necessary to mark the information as coming from an import, but it could be marked as coming from a specific ELN instead of coming from a user.

Ideas / Suggestions

The .eln file consists of an ro-crate-metadata.json and various other files, which should be listed in the ro-crate-metadata.json file. If possible, the ro-crate-metadata.json should include SHA256 hash values for those files, which allows us to trust that those files have not been tampered with (or suffered from data rot) as long as the ro-crate-metadata.json itself is trustworthy. As such, implementing a system of trust for the ro-crate-metadata.json should be satisfactory to ensure that the whole .eln file can be trusted.

A typical approach for this would to provide a signature for the ro-crate-metadata.json alongside the file itself. To do this, we need to figure out what digital signature scheme we would want to use, how keys are to be discovered and how the signature should be stored inside the .eln file.

Digital signature scheme

There are various schemes for how to generate keys, how to sign a series of bytes and how to verify the signature, and I'm not knowledgable enough in this area to suggest a specific scheme. I would suggest using an already widely used and supported scheme though.

Key distribution/discovery

As digital signature schemes use a key create the signature (or rather, a key pair), we will need a method of distributing or discovering the key (or rather, the public key) used for an .eln file.

One approach would be to have a chain of trust for those keys, similar (or ideally identical to) the one used for X.509 certificates used in TLS / HTTPS. This would have the advantage that it's a well-known scheme, already implemented widely and most web-based ELNs would already have a certificate (and associated key). The certificate chain could be provided alongside the signature, so that the signature can be checked as long as the root certificate authority is known (and trusted) and the .eln file certificates have not expired or revoked.

Another approach that would work for web-based ELNs would be to either query the origin ELN for its public key via HTTPS, or to submit the signature and ro-crate-metadata.json to the origin ELN for validation. The latter would have the advantage that we do not need to agree on a digital signature scheme at all, and that there's no need for key pairs, etc. as the signature could be generated and validated in various ways depending on the preferences of the ELN developer. The big disadvantage for both of these, of course, is that the origin ELN would need to be accessible, which is not always the case due to network issues or simply ELNs running behind a firewall.

Signature storage

The signature could either be stored in an additional file inside the .eln archive at a well-known location or it could be included in the extra field of the ZIP file. I personally think the simplicity of storing it in afile might be preferable, but for programs both methods should be equally easy to implement.


Personally, I think piggybacking off the infrastructure and expertise behind TLS / HTTPS would be easiest. Doing it this way, importing an .eln file should have the same security as directly importing the information from the origin ELN via HTTPS, and the origin ELNs domain could be shown as the source of the information without an additional caveat.

What are your thoughts on this?

additionalType

Hello all!

I'm in the process of allowing making a .eln that contains both experiments and resources (the two main entity type in elabftw). Previously, it was only one or the other.

In order to convey that information, I chose to use the additionalType property of the @type: Dataset.

So during import, I look at the additionalType property , to figure out if this node describes an experiment or a resource (which in elab can be anything like an antibody or a microscope).

The value could simply be: "experiments" or "resources". But if I create this issue it is to discuss with you what could be the best approach regarding the value.

I'm thinking of using http://semanticscience.org/resource/SIO_000994. And the code basically is: if it's that URI, it's an experiment, otherwise it's a resource. There are many others to choose from: https://lov.linkeddata.es/dataset/lov/terms?q=Experiment

So if on your side you've already digged into these aspects, please let me know here so that I can use the same as you!

Incremental updates to .eln

How do we want to handle updates to the .eln?

My suggestion, use the couchDB system which prevents overwriting of competing changes. It is a simplified when it comes to git, which stores the entire change history.

It requires 3 additional key-values
id: uuid_v4
rev: uuid_v4
old_rev: uuid_v4

  • id is optional as it only helps to ensure that the update targets the correct initial package.
  • rev: is a new rev_id
  • old_rev: is the rev_id of the package that is updated

If I send an update: I have to send the rev_uuid of the previous package. If somebody else changed the content and the package has a new rev_id, then I first have to pull that change, merge, and then can push my change with the now known rev_uuid.

How do we want to track, if content changed after the .eln import?

Description formats

Have we discussed yet on how to deal with different description formats? Specifically, I'm talking about the text property we use within datasets, files, comments, etc. In the current examples we have, one can find at least three different formats: HTML, Markdown and plain text. Generally, ELNs could either leave them as-is and hope for the best, or attempt to convert the text in a suitable format. However, such conversions can be error-prone, especially when the ELN would first need to detect the source format based on the contents.

A related question, at least for Markdown, HTML, etc., is also how to deal with linked, ELN-internal images. Even if these would be included in the Crate, the corresponding URL would need to be updated somehow in the text.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.