bids-standard / bep028_bidsprov Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 13.0 253.68 MB

Organizing and coordinating BIDS extension proposal 28 : BIDS Provenance

Home Page: https://bids.neuroimaging.io/bep028

License: Creative Commons Attribution 4.0 International

Python 79.41% Makefile 0.18% MATLAB 14.63% Shell 5.78%

bep028_bidsprov's People

Contributors

Stargazers

Watchers

Forkers

cmaumet tiborauer dbkeator omar-rifai neurorepro starborn cskrn satra inria-empenn remi-gau yarikoptic yibeichan bclenet

bep028_bidsprov's Issues

Storing RRID's

This is an update proposal for BIDS Prov (BEP028).

Problem Statement

As discussed on the BIDS-Prov meeting (November 8, 2021), in BIDS-Prov we need a way to store the RRID of a software agent.

Note: RRIDs are unique identifiers for software agent, as defined on the scicrunch.org website. More info at: scicrunch.org Getting Started and in this paper (10.1016/j.neuron.2016.04.030). Importantly RRIDs are not version-specific, e.g. SPM5 and SPM12 have the same RRID:SCR_007037).

Rationale

Currently in the draft BIDS-Prov

Currently in the BIDS-Prov specification, RRIDs are stored with a specific attribute rrid:

2.3 Agent (Optional)

Including an Agent record is OPTIONAL. If included, each Agent record has the following fields:

Key name	Description
@id	REQUIRED. UUID. An (randomly-assigned) identifier for the software (this identifier will be used to associated activities with this software).
rrid	OPTIONAL. URI. URI of the RRID for this software package (cf. scicrunch).
...

Pros:

A specific tag for rrids will make it very easy to query and retrieve this information
A specific tag for rrids prevents using other types of identifiers

Cons:

A specific RRID tag means that we will have to create a term (or reuse an existing one if already available)

Other alternatives

Dandi

... uses an "identifier" term, which is separate from the "@id"

Pros:

A specific tag for identifiers will make it very easy to query and retrieve this information
A generic "identifier" tag allows for using other types of identifiers

Cons:

When querying for the identifier, it might not be clear which type of identifier is retreived (RRID or something else?) -- although this might not be entirely true if an URI is used as a value in which case the base url can be an indication of the type of identifier.
id and identifier are very close words and might be confused

Question: has the dandi model been released? Can we directly reuse the terms?

mulltiline definitions for spm_parser.py

Is your feature request related to a problem? Please describe.
When running

curl -LJO https://raw.githubusercontent.com/incf-nidash/nidmresults-examples/master/spm_group_ols/batch.m

the downloaded batch.m contains the following lines

matlabbatch{1}.spm.stats.factorial_design.des.t1.scans = {
                                                          '/storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/SPM/LEVEL1/sub-01/con_0001.nii,1'
                                                          '/storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/SPM/LEVEL1/sub-02/con_0001.nii,1'
                                                          '/storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/SPM/LEVEL1/sub-03/con_0001.nii,1'
                                                          '/storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/SPM/LEVEL1/sub-04/con_0001.nii,1'
                                                          '/storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/SPM/LEVEL1/sub-05/con_0001.nii,1'
                                                          '/storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/SPM/LEVEL1/sub-06/con_0001.nii,1'
                                                          '/storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/SPM/LEVEL1/sub-07/con_0001.nii,1'
                                                          '/storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/SPM/LEVEL1/sub-08/con_0001.nii,1'
                                                          '/storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/SPM/LEVEL1/sub-09/con_0001.nii,1'
                                                          '/storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/SPM/LEVEL1/sub-10/con_0001.nii,1'
                                                          '/storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/SPM/LEVEL1/sub-11/con_0001.nii,1'
                                                          '/storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/SPM/LEVEL1/sub-12/con_0001.nii,1'
                                                          '/storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/SPM/LEVEL1/sub-13/con_0001.nii,1'
                                                          '/storage/essicd/data/NIDM-Ex/BIDS_Data/RESULTS/EXAMPLES/ds011/SPM/LEVEL1/sub-14/con_0001.nii,1'
                                                          };

564df3a
introduced a quick fix --> we just ignore those lines

Add CodeOfConduct, Contributing guidelines, etc ...

Update paths in BIDS-Prov examples

Paths should be changed to BIDS urls, for example:
https://github.com/bids-standard/BEP028_BIDSprov/blob/master/examples/from_parsers/spm/spm_default_batch.jsonld#L272

Ephys example

bids-standard/bep021#8 (comment)

Finalize proposal submission

First pass at updating the BEP028 spec to be compliant with latest discussions.

Add a usage examples of this spec for the following:

1- simplest model with one activity and its parameters,
2- workflow with more than 1 activity,
3- encoding environement (Matlab, docker, bash, etc.).
4- an example with hierarchies of activities (use isPartOf form PROVONE)
5- an example that uses the ontology of processings / image types
6- an example discussing file-level prov VS dataset-level prov

List all features that are in the examples but not yet in the specification [0.5D].

Delivrable: Itemized list in a markdown document in current repository.

Fork BIDS-specs and create a PR with the BIDS prov spec [1D]

BIDS spec repo: https://github.com/bids-standard/bids-specification
BIDS prov spec inside 03-modality-agnostic-files.md

globbing to describe collections of files

For now we use globbing to represent collections of files
Quoting the W3C-prov doc: Collections are defined as entities proving structures on top of other entities. In the context of file enumeration I found easier to use a syntax that many users are familiar with

Another aspect of entities in our framework is the "sha" field, which is used for quick equality checking between entities. A simple solution with files is simply to call a sha function on each file. In order to fill the "sha" field for a collection a files, we can simply pipe sha functions, i.e apply a sha function on the result of individual sha results.

Having many images in directory named 'fM00223', this gives

sha1sum fM00223/*.img | cut -d " " -f 1 | sha1sum | cut -d " " -f 1  # "cut" is used to trim filenames infos returned by "sha1sum"

which yields a single value for all .img files in this directory

This proposition aims to facilitate integration with existing software (eg. globbing is used in the SPM GUI to select files) as well as keeping our prov files as concise as possible

Provide more human-readable labels for activities

Is your feature request related to a problem? Please describe.
Currently the labels for the activities are automatically extracted from the keys in the matlabbatch, this is great but could be improved to have more human-readable labels.

Describe the solution you'd like
We could add a mapping between the keys in the matlabbatch and a human-readable name/label. This could be stored in a parameter file very similarly to what is done to add inputs to activities.

What do you think?

implement Digest Attributes for entity (file)

implement Digest Attributes for entity (file)
from BIDS_prov specification
Digest RECOMMENDED. Dict. For files, this would include checksums of files. It would take the form {"<checksum-name>": "value"}.
-> one checksum SHA for each file (input or output)

afni_default example [1day]

Similarly to what was done for spm_default in https://github.com/Inria-Visages/BIDS-prov/projects/3 but for afni_default.

Deliverables : drawing of the pipeline in the current repository + JSON-LD example in the current repository + add any update to be considered for the spec in the list created in #10.

FSL default v2 [3D][PR]

in a Pull Request

bring dead nodes back to life, linking them to their correct entities
validate graph with @cmaumet
iterate

Follow up Copenhagen BIDS-Prov meeting

This issue is open to keep track of discussions / things to do following BIDS-derivatives meeting :

We need an example in BIDS-Prov spec on how to include custom code for some of the steps
The file-level provenance that we had in older version can be more intuitive when writing the provenance of a single file and should probably included back in the spec
- Tentative examples from ephys are at: bids-standard/bep021#8 (comment)
Question could the BIDS-Prov file be in .prov.json and not .prov.jsonld to be simpler for the devs who know json (suggestion by Arnault)
We should consider making it possible to use BIDS url as identifiers

fsl_default example [1 day]

Similarly to what was done for spm_default in https://github.com/Inria-Visages/BIDS-prov/projects/3 but for fsl_default.

Deliverables : drawing of the pipeline in the current repository + JSON-LD example in the current repository + add any update to be considered for the spec in the list created in #10.

BIDS-Prov meeting: Nov 2, 3pm UTC

Hi everyone!

We are very happy to announce the first meeting to discuss BIDS-Prov that will be held, November 2nd, 3pm UTC i.e. 8am CA / 11am ET / 5pm Paris.

In this meeting we will discuss BIDS-Prov examples for SPM and FSL and how to make them more concise (similarly to the approach taken in reproschema).

Camille & @satra

Get a complete example of the BIDS prov spec on `spm_default` use case [2D]

Create a drawing of the pipeline
Rework the JSON-LD

Prepare 1 spec update for discussion [3h]

select one spec update proposal from the list created in #10
follow the issue template created in #14

Deliverable: An issue describing a prposed update on the spec

Pass all tests in linkchecker into BIDS-prov PR to BIDS-standard [3h]

Things to try:

remove problematic URL?

Delivrable: PR to BIDS-standard to pass all tests

BIDS-Prov meeting: Jan 24

Dear BIDS-Prov folks,

Thanks for joining us on our last BIDS-Prov meeting. Here is a brief summary of what happened and what we would like to focus on next.

First, the minutes of our two last calls are available at:

On our last call, we agreed on reviewing the BIDS-Prov specification (Google doc) and adding comments for any questions by our next call.

Thank you all and looking forward to seeing you on our next meeting on February 7. In preparation of our meeting, please feel free to include the points you would like to discuss directly in the agenda.

As always, I'd be very happy to answer any questions. Your contributions are very much appreciated!

Camille

Note: We meet every two weeks by videoconference on Mondays at 7-8am PDT / 10am-11am EDT / 3-4pm BST. The group is always open to new contributors interested in neuroimaging data sharing. To join the call or to ask any question, please email us at [email protected].

set of examples to cover

Can we agree on a set of examples to cover

For now I have only considered the first few examples from the "Dataset and examples" chapter from the SPM12 documentation.

@cmaumet it would probably be easier if we designate a set of examples in advance, along with a few mandatory fields in each example.

Lire comment écrire un bon README - https://mozilla.github.io/open-leadership-training-series/articles/opening-your-project/write-a-great-project-readme/

Finish spm_default example [1day]

cf. #7

Write end-to-end examples

Make examples fully BIDS compliant

bids folder structure
include sidecar .json file

finish visualisation of SPM example [3.5 D]

example (2d)
validation / tech meeting (1/2 d)
final commit (1d)

Participate to BrainWeb virtual hackathon [3 days]

Join the BrainWeb, take part if the kickoff on April 6 and attend the virtual hackathon.

Delivrable : write up about this experience of joining a virtual hackathon (possibly also about the project(s) you joined). This will be published on the Empenn blog/website.

datalad extractor

a custom extractor has been written for NIDM-results, should we do the same for BIDS-prov ?
@cmaumet

clear explanation of an Activity

as discussed with @cmaumet , @dbkeator and @satra on November 2nd

What do we mean by Activity ?
Currently the examples encapsulate everything related to the run : an activity is the call of an Agent on a specific set of entities, and includes parameters (as defined in a batch.m for example)

Perhaps we should rather separate user-oriented graph definition from a more complete description, which would contain Activities, and the full set of parameters

Multiple Entities as input/output

Update proposal for BIDS Prov (BEP028)

Problem Statement

Defining a pipeline usually consist in linking functions (Activities) to their inputs/outputs (entities), knowing the context (Agents)
Allowing only one input/output pair per Activity will probably end up in defining an artificially high number of activities, and quickly become cumbersome

Rationale

As an example, the segment activity in the SPM default example takes a single entity as input : a .nii updated header, and generates 5 distinct tissue files, so we need to allow for multiple entries/outputs to be declared. This way we can quickly link to the same activity, which is appropriate for reading/querying

Minimal example

Here is the entity definition for spm_default/coreg_and_segment.json

      {
        "@id": "niiri:fsiud1",
        "label": "tissue1",
        "wasAttributedTo": "RRID:SCR_007037",
        "wasGeneratedBy": "niiri:sdfsdofjiosdf",
        "derivedFrom": "niiri:fsiudfqsoi938409283409fdskj",
        "prov:atLocation": "$HOME/spm12/tpm/TPM.nii,1"
      },
      {
        "@id": "niiri:fsiud2",
        "label": "tissue2",
        "wasAttributedTo": "RRID:SCR_007037",
        "wasGeneratedBy": "niiri:sdfsdofjiosdf",
        "derivedFrom": "niiri:fsiudfqsoi938409283409fdskj",
        "prov:atLocation": "$HOME/spm12/tpm/TPM.nii,2"
      },
...

and here is the associated subgraph

Create an issue template that we will use to propose updates on the spec [2h]

Choose one feature that is not currently in the spec and use it as an example to create an issue template for proposal of updates on the spec.

This issue template will include the following: "Minimal example before/after", "rationale" and probably more.

Look at examples of issues in the https://github.com/bids-standard/bids-specification/ to see if there is a common structure / existing issue templates?

Deliverable: a markdown file (in the current repo) as Github issue template.

parameters encoding

Update proposal for BIDS Prov (BEP028)

Problem Statement

At OHBM we already had a few questions about how one should encode parameter

For the moment we allow passing any json-compliant values into the attributes field of an Activity

We should not encode parameters, but provide a way to encode parameters
My Suggestion is that we define a new type name Parameter

Checklist

update new_features.md at the root of this project

Start "utils" library (include viz from JSON-LD 1.1) [1 day]

define a context.json at the root

put everything into a context.json file and reference it simply by providing the URI

add clear explanations for spect update [5D]

update new_features.md
create issue for type indexing
create issue for "Activity definitions"
create issue for "Activities attributes"
create issue for Multiple entities as input
validate

First blog article

@cmaumet we should also think about a first article to be posted by the end of this month, as discussed in meeting.

It can be very simple (eg. just showcasing what's been done), but we should seek a formal definition of what's inside

Cheers,
Rémi

High Level Example

add an example that is high level, i.e where a node in the graph encapsulates a call to a docker container, as in FMRI prep

SPM Parser for BIDSProv

Is your feature request related to a problem? Please describe.
Right now we provide short examples that relies on our model
To get faster in the inclusion/discussion of new examples we need to automate their translation into .jsonld or .turtle files that respect BEP028

Describe the solution you'd like
We want a parser that

takes .m files as input, as those provided in nidm-results
outputs a valid .jsonld file

This parse will have to be updated with regard to the specification

TODO

get spm_default and spm_groups_ols
get a first example, with each cell in the original .m file having an activity in the produced sidecar file
write a showcase example of a pipeline, eg. parsing | visalisation
write a github action to make sure the parser do not fail with a bunch of .m files (regression testing)

Choose names to replace Activity/Entity/Agent in the BIDS-Prov skeleton

(This issue is opened following progress made on the specification at the OHBM Brainhack.)

Problem Statement

In the BIDS-Prov skeleton, we are currently referring directly to PROV terms (Activity/Entity/Agent).

Those should be replaced by subtypes that will be specific to BIDS-Prov (but generic enough to encompass any type of object).

Rationale

As a starting point "Activity" could be replaced by "Processing" as discussed w/ @ssaneei and Michael Dayan. "Entity" by "InputOutput" and "Agent" by "SoftwarePackage"?

Minimal example

{
"@context": "https://purl.org/nidash/bidsprov/context.json",  
"BIDSProvVersion": "1.0.0",
"records": {
	"SoftwarePackage": [
  	{
    	...
  	}
	],
	"Processing": [
  	{
    	...
  	},
  	...
  	}
	],
	"InputOutput": [
  	{
    	...
  	},
  	{
    	...
  	},
	]
  }
}
}

Log of related discussion on Gdoc:

Updating description in BIDS-prov PR to BIDS-standard [1/2 day]

In a google doc, write up a description that will go at the top of PR to BIDS-standard including: listing all authors from the original google doc, giving a high-level description of what this PR is doing.
Share with cmaumet and satra for feedback
Post on the PR.

Deliverable : PR to BIDS-standard with description updated.

type indexing explained

Update proposal for BIDS Prov (BEP028)

Problem Statement

BIDS-prov provides a framework to describe any neuroimaging pipeline as a graph of operations, defined over digital entities
For our description to be generic enough we use 3 main concepts: Activities, Entities, and Agents.

One our graph is built, we want to allow a broad range of operations on it. The most basic operation we could think of is querying the graph, e.g to search for an entity giving part of its name.

Rationale

json-LD does not bring any kind of constraint on how we should define a graph, all we have to do is to respect the JSON syntax. Activities, Entities, and Agents could be defined anywhere, in any order, which makes it harder to investigate.

For our queries to be written easily, and run fast, we have to find a compromise between respecting the JSON syntax and setting up constraint on the structure of our graph.

For this purpose we use type indexing, which consists in using the types of the digital objects we describe as the primary key. This gives a very simple structure to our graph (a key for Agents, one for Entities, and one for Activities), yet allowing flexible definitions to correspond to those keys.

Minimal example

Here is an extract from examples/spm_default/realign.json

    ...
    "prov:Agent": [
      {
        "@id": "RRID:SCR_007037",
        "@type": "prov:SoftwareAgent",
        "label": "SPM"
      }
    ],
    "prov:Activity": [
      {
        "@id": "niiri:fdskjfnskjndflqkjndl",
        "label": "realign",
        "wasAssociatedWith": "RRID:SCR_007037",
        "startedAtTime": "10/10/2020 00:00:00",
        "endedAtTime": "10/10/2020 01:00:00",
        "used": "niiri:sjhgdqd",
        "attributes": [
          ["eoptions.quality", 0.9],
          ["eoptions.sep", 4],
          ["eoptions.fwhm", 5],
        ]
      }
    ],
    "prov:Entity": [
      {
        "@id": "niiri:fdsjnflqj12381U39fdskjnf",
        "wasAttributedTo": "RRID:SCR_007037",
        "wasGeneratedBy": "niiri:fdskjfnskjndflqkjndl",
        "derivedFrom": "niiri:sjhgdqd",
        "label": "Realigned func",
      }
    ]
  }

and here it how it would turn WITHOUT TYPE INDEXING

    ...
     [
      {
        "@id": "RRID:SCR_007037",
        "@type": "prov:Agent",
        "label": "SPM"
      },
      {
        "@id": "niiri:fdskjfnskjndflqkjndl",
        "@type" : "prov:Activity",
        "label": "realign",
        "wasAssociatedWith": "RRID:SCR_007037",
        "startedAtTime": "10/10/2020 00:00:00",
        "endedAtTime": "10/10/2020 01:00:00",
        "used": "niiri:sjhgdqd",
        "attributes": [
          ["eoptions.quality", 0.9],
          ["eoptions.sep", 4],
          ["eoptions.fwhm", 5],
        ]
      },
      {
        "@id": "niiri:fdsjnflqj12381U39fdskjnf",
        "@type" : "prov:Entity",
        "wasAttributedTo": "RRID:SCR_007037",
        "wasGeneratedBy": "niiri:fdskjfnskjndflqkjndl",
        "derivedFrom": "niiri:sjhgdqd",
        "label": "Realigned func",
      }
    ]

Checklist

links to related existing issues and/or PR

rewrite README and repo transfer to BIDS-spec

rewrite taking https://github.com/bids-standard/bep001 as example

Ecrire le README à la racine pour permettre des contributions (via issues) - https://github.com/Inria-Visages/BIDS-prov [ 0.5j ]
Relecture Camille et iteration/modif [ 0.25j ]
Faire une demande de transfer vers BIDS-standard - https://github.com/BIDS-standard/BEP028 [ Camille ]

BEP update

Hey BEP028!

Happy new year! I am opening up this issue to inquire if you all may have a status update to share? These updates are shared on our website. I have included a couple points to guide the update.

BEP status update:

Status update on BEP 028
Sharing the blocking items or sticking points
Items left to discuss and clarify

Thank you!

investigate type indexing [1D]

@cmaumet can you provide more ressource for this ?

OHBM abstract

Spec update issue template not displayed

Hi @remiadon! I've merged your pull request creating the issue template for spec update proposals (#19) but for some reason it is not displayed at: https://github.com/bids-standard/BEP028_BIDSprov/issues/new/choose

Can you look into this?

URN and UUID

Update proposal for BIDS Prov (BEP028)

Problem Statement

On the BIDS-Prov meeting (November 8, 2021), we agreed to include the following in BIDS-Prov:

For UUIDs that we can resolve : we'll use a specific prefix (similar to e.g. dandiasset for DANDI)
For UUIDs that we cannot resolve from any service in any way : use an urn prefix

Looking more closely at urn, it looks like those have to be accompagnied by a registered namespace identifier, cf. https://en.wikipedia.org/wiki/Uniform_Resource_Name.

We could directly reuse the already registered UUID namespace identifier, e.g. "urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66"

@satra: can you confirm that the latter (i.e. using urn:uuid) is what you had in mind)?

Once we have converged on this, TODO:

Update the spec and examples to replace all instances of "niiri:" with "urn:uuid:"
Add a section in the BIDS-Prov spec explaining how identifiers are chosen (two options as described above)

validator module

One thing we might want to do at some point in providing a provenance framework is providing a validator for it
If an institution or a user creates prov files, we should provide them a program to check the validity of those files within the framework

This program should :

take *.json files as input
return a boolean value at the very end
raise warnings and errors along the way to enlighten the user about ways to fix those issues, in a clear and understandable manner.

In other words running this program acts as a sanity check.

For warnings and errors, a way would be to use the python logging module, but that looks a bit tedious for that. For a V1 I think we can use the warnings module, and raise a warning if anything looks non-valid, and just return False in any other situation

Meeting with committee [3D]

User stories

Until now we have discussed pros and cons of different concepts and features in BEP028. The few use cases implemented as sidecar .jsonld files corresponding to standard examples. To go a little bit deeper and foster a broader range of users we would like to formulate real-life examples and discuss their implementation with the current standard

First set of user stories

As a researcher I'd like to found out which realignment algorithm was applied in order to understand how it affects my final results
As an SPM developper implementing the BIDS-PROv export I'd like to get a list of all activities in order to verify that it is consistent with my matlabbatch script.
As an SPM user I'd like to visualize the BIDSProv graph corresponding to my matlabbtach file in order to get a visual representation of my pipeline (for example to be shared in a paper).

Activity attributes as DataElements

provide an example using nidm:DataElement
rewrite SPM and FSL examples if needed
provide a query example
update new_features.md

An example using DataElement in turtle format here

bids-standard / bep028_bidsprov Goto Github PK

bep028_bidsprov's People

Contributors

Stargazers

Watchers

Forkers

bep028_bidsprov's Issues

Problem Statement

Rationale

Currently in the draft BIDS-Prov

Other alternatives

Dandi

Update proposal for BIDS Prov (BEP028)

Problem Statement

Rationale

Minimal example

Update proposal for BIDS Prov (BEP028)

Problem Statement

Checklist

Problem Statement

Rationale

Minimal example

Log of related discussion on Gdoc:

Update proposal for BIDS Prov (BEP028)

Problem Statement

Rationale

Minimal example

Checklist

Problem Statement

Recommend Projects

Recommend Topics

Recommend Org