theelnconsortium / theelnfileformat Goto Github PK

View Code? Open in Web Editor NEW

43.0 15.0 8.0 22.55 MB

Specification for the ELN File Format

License: MIT License

Python 100.00%

theelnfileformat's Introduction

ELN file format (.eln)

Description

The ELN file format is an archive format for exchange of experimental results and data.

This file format can be created and read by software such as Electronic Laboratory Notebooks.

Specification

The file format is specified in the SPECIFICATION.md file. It follows the RO-Crate specification and bundles the files into a ZIP archive.

The .eln file/extension is an accepted and recognized media type (previously known as MIME type), see IANA assignment: https://www.iana.org/assignments/media-types/application/vnd.eln+zip

Origin

This file format was created for the needs of better interoperability between ELN software. So editors of these software decided it would be useful for users to have an archive format that can be easily understood by several ELNs.

Status

Generally working with some quirks here and there.

Known implementations

Implementation	.eln import	.eln export	Example
eLabFTW	✅	✅	elabftw
Kadi4Mat	✅	✅	kadi4mat
Pasta	✅	✅	PASTA
SampleDB	✅	✅	SampleDB
Rspace	✅	✅	RSpace

theelnfileformat's People

Contributors

Stargazers

Watchers

Forkers

jmanideep florianrhiem sensolytics-gmbh nhanlon2 steffenbrinckmann opensemanticlab stain joedavies-6

theelnfileformat's Issues

Trailing slashes in context URLs

After having serious trouble exporting correct JSON-LD, I think the RO-Crate spec has an error: The @context URIs should end in a “/”, shouldn’t they? For instance:

{                                                                                                                                             
  "@context": {                                                                                                                               
    "@vocab": "https://w3id.org/ro/crate/1.1/context/",                                                                                       
    "sm": "http://scimesh.org/SciMesh/"                                                                                                       
  },                                                                                                                                          
  "@graph": [                                                                                                                                 
    {

List ELN implementations

Hi, would you be able to list/link implementations of TheELNFileFormat on the README?

The challenge is that "ELN" is an overloaded terms so it's a bit tricky to check in docs or google..

So far I think the implementations are: eLabFTW, RSpace, PASTA-ELN (documentation link?)

https://github.com/TheELNConsortium/#participating-eln-projects has a bit more but I guess not all have implemented ELN format yet.

Then I can update https://github.com/ResearchObject/ro-crate/blob/master/docs/profiles.md#electronic-lab-notebook-eln or make a separate "in-use" page from the RO-Crate web site to help promote further.

Citations like https://doi.org/10.1186/s13326-021-00257-x would also be useful to capture!

Question concerning directory structure

I read the specification like it says that the eln file and the content of file should have the same name

Inside a .eln file, there MUST be a folder that will contain the rest of the data. The name of the folder SHOULD be the same as the archive name. This folder at root prevents issues when opening the file as a zip file and getting archived files extracted in the current directory, possibly overwriting other files, and probably polluting the current directory. […]

<root>
  some-data.eln/
    - ro-crate-metadata.json
    - experimentA/:[…]

However, to my knowledge, that would not work as unzipping the eln file would overwrite the currently unzipped file by a directory of the same name.
The examples are hence also not confirming to my interpretation of the text, but either have a completely different name (ok, because of should and not must) or the eln/zip file contains a directory without the .eln extension in its name (collections-example.eln -> collections-example/).

I would be pleased if either a formulation could be found that cannot be interpreted like I did or if the examples provided in this git repository would show the described behaviour and a description on how to generate and extract such zip files that are named exactly as their content.

Sensolytics example file

Hi,
I tried to add a first example for our measurement data, but I was not allowed to upload a branch to the code to create a pull request. Might not be necessary anyway, as the reason was mainly to see if the format can be read by the participating systems. I add the readme and eln file here, let me know if we can/should change anything...
sensolytics.zip

media type / content type / MIME type for .eln

Hello. Have you considered registering a media type / content type / MIME type for .eln?

My understanding is that it involves listing your format here: http://www.iana.org/assignments/media-types/media-types.xhtml

Flexible metadata in .eln files

Currently, the .eln format does not have a unified way of exporting flexible metadata. Instead, most ELNs export a data structure specific to that ELN in JSON format which contains various information about a dataset, including some flexible metadata. As they are (mostly) representations of internal models, they vary quite a bit.

Motivation

While it is already useful to be able to reference samples, measurements and other objects from other ELNs with some generic metadata such as the creation and modification times and the author, it would be even better if we could exchange flexible metadata about these. In the last meeting, we briefly discussed the goal of a "gold standard" experiment that can be represented as an .eln file, imported and exported by the various ELNs. For this, we should be able to exchange information such as instrument or process parameters. We cannot expect to strictly define these, instead they should map a (textual) identifier to data of some type.

Ideas / Suggestions

For mapping identifiers to values, the PropertyValue should be useful, as it can map its propertyID to its value, which can be a boolean, text, a number or the generic StructuredValue type, and also supports units and a human-readable version as a fallback. So, if we had to store a temperature with the identifier target_temperature, it could be represented as:

{
  "@type": "PropertyValue"
  "propertyId": "target_temperature",
  "value": 293.15,
  "unitCode": "KEL",
  "unitText": "K",
  "description": "293.15 K"
}

while a boolean instrument setting could be represented like this:

{
  "@type": "PropertyValue"
  "propertyId": "vacuum_enabled",
  "value": false
}

If we would provide an array of such property values, we could support a flat mapping of identifiers to values. Such an array is part of the Dataset type we use for datasets in the ro-crate-metadata.json in the property variableMeasured, however this use case goes beyond the "variables that are measured in some dataset". So we could either use that property and "stretch" its definition by a fair bit, use another existing property, or branch off there and define a custom property.

As PropertyValue objects can contain a value of StructuredValue type, of which PropertyValue is a sub-type off, it might also be possible to implement nested data structures like this. Alternatively, the structure could be represented in the identifier.

What are your thoughts on storing flexible metadata in PropertyValue objects? Which solution for attaching these to the datasets do you prefer? How should we deal with nested metadata? Would you prefer to store flexible metadata outside the ro-crate-metadata.json entirely or in a custom format instead?

Zenodo support

inveniosoftware/product-rdm#133

Interoperability

Hi,

When testing interoperability between Kadi4Mat, eLabFTW and RSpace (those I have access to currently), I found that in only one instance is expected behaviour given, when exporting from Kadi4Mat and importing into eLabFTW: see the table below. I'm using the latest version of eLabFTW, the hosted demo instance of Kadi4Mat and RSpace Community. This is a continuation from this post, where I've contained to use MWEs containing tagged single entries with text and a tag in each case.

Import (below) / Export (right)	Kadi4Mat	eLabFTW	RSpace
Kadi4Mat	n/a	❌	❌
eLabFTW	✅	n/a	❌
RSpace	❌	❌	n/a

Signatures for .eln files

Currently, there is no mechanism to ensure that an .eln file was actually created by a specific ELN, which poses issues related to data provenance and in how far the information inside the .eln can be trusted.

Motivation

While ELNs generally assume that the information from users can be trusted, there are cases where it may be necessary to know that the information was entered at a specific time by a specific person, and where proof is needed for this instead of relying on the users to be trustworthy. To achieve this, many ELNs include mechanisms such as timestamping, versioning or signing of entered information.

For those ELNs, importing an .eln file must not circumvent these mechanisms, as that would render them useless, and as such all information has to be marked as coming from an import from user X at date/time Y. A chain of trust for information from .eln files could potentially improve the situation there, as another ELN might be more trust-worthy than users in these cases. It would still be necessary to mark the information as coming from an import, but it could be marked as coming from a specific ELN instead of coming from a user.

Ideas / Suggestions

The .eln file consists of an ro-crate-metadata.json and various other files, which should be listed in the ro-crate-metadata.json file. If possible, the ro-crate-metadata.json should include SHA256 hash values for those files, which allows us to trust that those files have not been tampered with (or suffered from data rot) as long as the ro-crate-metadata.json itself is trustworthy. As such, implementing a system of trust for the ro-crate-metadata.json should be satisfactory to ensure that the whole .eln file can be trusted.

A typical approach for this would to provide a signature for the ro-crate-metadata.json alongside the file itself. To do this, we need to figure out what digital signature scheme we would want to use, how keys are to be discovered and how the signature should be stored inside the .eln file.

Digital signature scheme

There are various schemes for how to generate keys, how to sign a series of bytes and how to verify the signature, and I'm not knowledgable enough in this area to suggest a specific scheme. I would suggest using an already widely used and supported scheme though.

Key distribution/discovery

As digital signature schemes use a key create the signature (or rather, a key pair), we will need a method of distributing or discovering the key (or rather, the public key) used for an .eln file.

One approach would be to have a chain of trust for those keys, similar (or ideally identical to) the one used for X.509 certificates used in TLS / HTTPS. This would have the advantage that it's a well-known scheme, already implemented widely and most web-based ELNs would already have a certificate (and associated key). The certificate chain could be provided alongside the signature, so that the signature can be checked as long as the root certificate authority is known (and trusted) and the .eln file certificates have not expired or revoked.

Another approach that would work for web-based ELNs would be to either query the origin ELN for its public key via HTTPS, or to submit the signature and ro-crate-metadata.json to the origin ELN for validation. The latter would have the advantage that we do not need to agree on a digital signature scheme at all, and that there's no need for key pairs, etc. as the signature could be generated and validated in various ways depending on the preferences of the ELN developer. The big disadvantage for both of these, of course, is that the origin ELN would need to be accessible, which is not always the case due to network issues or simply ELNs running behind a firewall.

Signature storage

The signature could either be stored in an additional file inside the .eln archive at a well-known location or it could be included in the extra field of the ZIP file. I personally think the simplicity of storing it in afile might be preferable, but for programs both methods should be equally easy to implement.

Personally, I think piggybacking off the infrastructure and expertise behind TLS / HTTPS would be easiest. Doing it this way, importing an .eln file should have the same security as directly importing the information from the origin ELN via HTTPS, and the origin ELNs domain could be shown as the source of the information without an additional caveat.

What are your thoughts on this?

additionalType

Hello all!

I'm in the process of allowing making a .eln that contains both experiments and resources (the two main entity type in elabftw). Previously, it was only one or the other.

In order to convey that information, I chose to use the additionalType property of the @type: Dataset.

So during import, I look at the additionalType property , to figure out if this node describes an experiment or a resource (which in elab can be anything like an antibody or a microscope).

The value could simply be: "experiments" or "resources". But if I create this issue it is to discuss with you what could be the best approach regarding the value.

I'm thinking of using http://semanticscience.org/resource/SIO_000994. And the code basically is: if it's that URI, it's an experiment, otherwise it's a resource. There are many others to choose from: https://lov.linkeddata.es/dataset/lov/terms?q=Experiment

So if on your side you've already digged into these aspects, please let me know here so that I can use the same as you!

Incremental updates to .eln

How do we want to handle updates to the .eln?

My suggestion, use the couchDB system which prevents overwriting of competing changes. It is a simplified when it comes to git, which stores the entire change history.

It requires 3 additional key-values
id: uuid_v4
rev: uuid_v4
old_rev: uuid_v4

id is optional as it only helps to ensure that the update targets the correct initial package.
rev: is a new rev_id
old_rev: is the rev_id of the package that is updated

If I send an update: I have to send the rev_uuid of the previous package. If somebody else changed the content and the package has a new rev_id, then I first have to pull that change, merge, and then can push my change with the now known rev_uuid.

How do we want to track, if content changed after the .eln import?

Description formats

Have we discussed yet on how to deal with different description formats? Specifically, I'm talking about the text property we use within datasets, files, comments, etc. In the current examples we have, one can find at least three different formats: HTML, Markdown and plain text. Generally, ELNs could either leave them as-is and hope for the best, or attempt to convert the text in a suitable format. However, such conversions can be error-prone, especially when the ELN would first need to detect the source format based on the contents.

A related question, at least for Markdown, HTML, etc., is also how to deal with linked, ELN-internal images. Even if these would be included in the Crate, the corresponding URL would need to be updated somehow in the text.

Validation schema and online editor

We are currently on the way implementing RO-Crate / ELN Fileformat for OpenSemanticLab.

On the way we will create a validation schema using OO-LD.

As discussed with @SteffenBrinckmann this issue to share the work early and provide a first preview (code):
playground example

theelnconsortium / theelnfileformat Goto Github PK

theelnfileformat's Introduction

ELN file format (.eln)

Description

Specification

Origin

Status

Known implementations

theelnfileformat's People

Contributors

Stargazers

Watchers

Forkers

theelnfileformat's Issues

Motivation

Ideas / Suggestions

Motivation

Ideas / Suggestions

Digital signature scheme

Key distribution/discovery

Signature storage

How do we want to handle updates to the .eln?

Recommend Projects

Recommend Topics

Recommend Org