Comments (15)
@yarikoptic if you have time, would you mind updating the top message so we can know what aspects of this issues have NOT yet been addressed or clarified sufficiently by #946
from bids-specification.
wow -- it is very popular in back references, but nobody voiced their opinions. @tsalo @sappelhoff @Remi-Gau @effigies WDYT?
This is the can of worm issue. We know it's there and that we are going to have to open it at some points but nobody likes to eat worm and not just because some of us are vegetarians.
from bids-specification.
I will @Remi-Gau ... I guess feel welcome to close if I don't get back to it let's say by Sep 1 (hopefully before ;)).
from bids-specification.
and only now realized that we do not "officially/clearly" have a way to define the most common .json
for inheritance, which might generalize for ALL other files.
Well -- dataset_description.json
is kinda like that file -- since I guess all information pertinent to the dataset as a whole is pertinent to individual files as well. But then should it be considered while constructing the full metadata record for a particular leaf file?
from bids-specification.
Thinking about it even more, I still like my algorithmic approach (with a single caveat outlined in the bottom of the original post, still thinking about it) to the definition and would like to propose to remove that restriction to have only a single "applicable" file at any given level
.
- as stated above, the most common file to absorb all "common" fields is actually already there and it is
dataset_description.json
. Whatever applies to the entire dataset does apply to each file by default IMHO. - I might want to have
ses-X.json
on top level to provide specific common fields for any subject/modality for ses-X in the study (e.g. that a specific scanner/software was used - a real use case from today!), which was different from another scanner in ses-Y. - And then I would like to have
task-A_bold.json
andtask-B_bold.json
which would have other common fields specific to each task so I could have a quick overview of differences between tasks.
I could furthermore provide specifications for differentacq-
etc, pretty much may be even per each_key-
if I am that good. That makes it possible to scale nicely for large(r) studies and provide a nice summary of differences on top of the dataset.
In above example, all 3 specific types of .json files (or at least two, if we exclude dataset_description.json
) at the same top level should then be used while "inheriting".
Instead of the rule for "a single applicable file" there could be a rule (with a validator check) that files at any given level must not be in conflict, in that they must not re-define the same field (i.e. a field F to be defined both in task-_bold.json and ses-.json files) thus making it ambiguous since would depend on the order on how "inheritance" is done across those files at the same level. If there is no overlap in fields, no ambiguity -- no need for a restriction.
from bids-specification.
Continuing talking to myself, here is a list of non task-
files in the top level to be inherited:
$> ls ds00*/*json | grep -v -e dataset_description -e participants -e task- | xargs -n 1 basename | sort | uniq -c
1 acq-epi_T1w.json
1 acq-flipangle05_run-01_MEFLASH.json
1 acq-flipangle30_run-01_MEFLASH.json
1 acq-moldOFF_T1w.json
1 acq-moldON_T1w.json
1 acq-mprage_T1w.json
1 bold.json
1 dir-0_epi.json
1 dir-1_epi.json
1 dir-AP_epi.json
1 dir-PA_epi.json
3 dwi.json
1 inplaneT2.json
3 phasediff.json
1 run-1_echo-1_FLASH.json
1 run-1_echo-2_FLASH.json
1 run-1_echo-3_FLASH.json
1 run-1_echo-4_FLASH.json
1 run-1_echo-5_FLASH.json
1 run-1_echo-6_FLASH.json
1 run-1_echo-7_FLASH.json
1 run-2_echo-1_FLASH.json
1 run-2_echo-2_FLASH.json
1 run-2_echo-3_FLASH.json
1 run-2_echo-4_FLASH.json
1 run-2_echo-5_FLASH.json
1 run-2_echo-6_FLASH.json
1 run-2_echo-7_FLASH.json
24 T1w.json
2 T2w.json
attn @tyarkoni (happen you didn't spot my whining here before)
from bids-specification.
wow -- it is very popular in back references, but nobody voiced their opinions. @tsalo @sappelhoff @Remi-Gau @effigies WDYT?
from bids-specification.
TBH I'm not sure I understand the issue. It's just that we should be able to have multiple applicable JSON files at the same level?
I remember when we made that rule, and it was because the precedence within a level is ambiguous, so overrides could be implementation- or even RNG-dependent.
Given that there is some appetite for removing inheritance altogether in BIDS 2.0, I can't imagine the people who feel that way being pleased with it getting even more byzantine.
from bids-specification.
HED (Hierarchical Event Descriptors) relies heavily on inheritance. Specifically the _events.json
files can be inherited. This is crucial for annotation, as it allows a user to provide a single file at the top level containing the event annotations for the entire dataset.
Having a single _events.json
sidecar at the top level is recommended process for annotation. To force users to make copies of the annotation at lower portions of the dataset means that users of the data would have no idea whether the same annotations were being used across datasets.
HED uses the rule that _event.json
files can be placed at any level. A duplicated key at a lower level overrides the contents at a key at a higher level. We haven't considered that there might be multiple applicable _events.json
sidecars at the same level.
HED really needs to have inheritance, but we only need one applicable sidecar at a given level. @sappelhoff @dungscout96 @smakeig
from bids-specification.
I would just add:
- If we are to keep inheritance, encoding it in the schema somehow might be a good idea, as currently any validator implementation would need to hard-code both when and how it applies.
- Perhaps it is worth considering whether we could drop the inheritance principle in the future. The amount of text duplication it prevents is trivial in terms of size on disk, and it makes datasets both more monolithic and less transparent --- particularly for manual inspection.
from bids-specification.
from bids-specification.
Dear @VisLab , I am 100% with you on this! Please (attn @TheChymera as well) follow up on the dedicated issue for that at bids-standard/bids-2-devel#36. Here we are not getting rid of Inheritance principle but working toward "fixing it"! ;)
Although I see the desire expressed by @TheChymera as "encoding it in the schema somehow", I would argue that "there is no need" (if not impossible).
Rationale: The entire src/schema can be loaded only following an "algorithm" which is coded up in tools/schemacode. schema/README provides description of the related data structures and some parts of the algorithm with that. Using that description in README.md and implementation in dandischematools as "canonical" other tools could provide implementations to load the schema. So we already have a duality of "schema" and "algorithm".
Situation is likely to stay the same for Inheritance principle. And IMHO it is ok to just express it as an algorithm (original desire) or a clear set of rules (current formulation) which gets coded up in dandischematools
. I think current formulation in common principles: The Ineritance Principle (let's call it IP here) got much better through the work by @Lestropie in #946. There IP is expressed as a set of rules, and they can be expressed as an algorithm (multiple versions of it really, let's not get there here ;) ) and coded up. In that PR we have touched on the aspect the IP is not yet covering and which I have mentioned in my original description. We agreed with @Lestropie in the PR #946 that it needs to be done post that PR, but AFAIK it was not yet addressed. That should not prevent us though from implementing current version (I filed #1181) of the IP and then see where it is not yet complete or fails empirically.
But as current IP was cleared up in #946 I think it did address most of my original questions -- I think we can let this issue RiP, and instead file more specific issues or PRs to make "inheritance principle better" (hopefully without introducing backward incompatible changes). FTR here is a summary of issues/questions I had in mind and which might have been "addressed" or not (separate issues):
- comment above: having both
ses-X_bold.json
(for common to ses-X across all subjects metadata) andtask-Y_bold.json
at the same top level. Following 80/20 principle, let's not bother about them. Current rule IP.4 "partially" forbids it. "Partially" because I could haveses-1_bold.json
andsub-{1,2,...}/task-Y_bold.json
to overcome in some ugly way. But I do consider such combinatorics well within those 20% to not bother about - Original description had a case of having both
task-task1_acq-X_bold.json
andtask-task1_bold.json
-- I filed a dedicated #1182 to address it one way or another; - my ideas throwing about inheriting from datasets_description.json (and probably others like participants.tsvetc) are addressed by the restriction that we must retain the same
_suffix
. Higher level tools would "bind" participant etc metadata with metadata of specific data files.
from bids-specification.
@yarikoptic thanks for pointing out the main issue.
@VisLab I agree that it is more convenient when taking a monolithic approach to the dataset, though I disagree that:
Now if a change needs to be made [...] It really is a mess.
Tools can be easily written to summarize information if needed.
I would argue the reverse, that changes (infrequent) can easily be made across low-level files with ubiquitous tools such as find
, grep
, and sed
, and information integration across inheritance levels (frequent) is non-trivial (e.g. #1182 ) and constitutes another nonstandard extraneous library, which may be handled differently in different packages.
In any case for the time being inheritance will not be dropped due to backwards-compatibility, and perhaps better discussed in the issue linked by @yarikoptic above.
from bids-specification.
Much of the earlier text posted in this issue relates to my follow-up proposal to #946, being #1003. Sorry I didn't find this thread and link to it.
IMO it should be both possible and recommended to define at higher levels of the filesystem hierarchy any metadata that are common to many files at lower levels of the hierarchy based on a set of common entities. This is a natural recapitulation of the hierarchical nature of the metadata. Duplicating such data, whether electively or through obligation due to removal of the inheritance principle, would mean that any application that depends on such information may need to check for consistency of those data across sidecar files, which contra-indicates such a design.
I spent a fair bit of time devising the language of #1003 as a way of systematizing the definition of how sidecar files can be present both within and across filesystem levels, and their relative ordering is wholly unambiguous, without a need to access the contents of those files. I tried to do it in a way that was both comrehensible for users, and sufficiently precise to guide software implementations. This was a natural extension of the existing IP, and was adequate for my own use case (though see bids-standard/bids-bep016#50 for the latest on that topic).
@yarikoptic may actually be looking for something even more general & powerful here. As shown in Example 2 in the current iteration of #1003, there are some circumstances in which having multiple sidecar files in a single filesystem hierarchy level is still impermissible: even though both possess a subset of the entities in the data file of interest, there is no objective way to decide the order in which they should be loaded, and therefore, if there is a metadata field that is present in both files, the content of that metadata field as applied to the data file of interest is ambiguous.
Theoretically, one could generalise the IP even further, permitting such "parallel" inheritance only if there are no metadata fields common to both files. This would introduce even further complexity to both the IP and the validator; but as long as the validator is capable of checking such exhaustively, software applications would not strictly need to check for such.
Given the amount of resistance I've had to #1003, I highly doubt the chances of such a proposal getting through; but I wanted to elucidate here fully given the relevance. If anything I think this would actually be more intuitive for users willing to exploit this level of complexity, as they could simply define metadata fields in sidecar files at the correct level with the correct set of entities that define their relevance. But complexity of both understanding and implementation is not something to commit to without very careful thought.
from bids-specification.
Theoretically, one could generalise the IP even further, permitting such "parallel" inheritance only if there are no metadata fields common to both files. This would introduce even further complexity to both the IP and the validator; but as long as the validator is capable of checking such exhaustively, software applications would not strictly need to check for such.
;-) "great mind think alike" (using it 2nd time this Monday, a good sign!!) -- I have left https://github.com/bids-standard/bids-specification/pull/1003/files#r940247739 before reading this reply (due to the ordering of my mailbox ;-) ). Yes, it would add more complexity to validator and somewhat to implementation of inheritances principle, but IMHO worthwhile.
from bids-specification.
Related Issues (20)
- schema: should associations/bval/target have a suffix? HOT 1
- mention that session level folder can be dropped from derivatives if not relevant HOT 4
- anonymization → deidentififcation (or pseudonymization) HOT 2
- entity table renders is "sub-optimal" in PDF
- Forcing .gz compression on physio files not optimal HOT 5
- [BUG] `Namespace.to_dict()` does not completely work
- [ENH] Accept Parquet (`<entities>_<suffix>.parquet`) as alternative to `.tsv` and `.tsv.gz` formats HOT 3
- [ENH] allow timestamps for physio table HOT 15
- BEP032 Modality, Datatype, and Suffix Name(s) HOT 40
- RepetitionTimePreperation (Preperation → Preparation?)
- update qMRI to align with the new definition of derivatives HOT 4
- BIDS motion -- how to categorize IMU data as a modality? HOT 2
- Dashes in pipeline names / versions
- formatting issue HOT 2
- no "IntendedFor" for "inplaneT[12]" HOT 1
- [BUG] Inconsistent level for `MagneticFieldStrength` in sidecar rules of schema HOT 5
- Classify "phenotype/" as a datatype directory with no subject/session parent HOT 6
- Background suppression timing for ASL
- Change the way the BIDS spec changelog is created. HOT 1
- Motion-BIDS for MRI
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bids-specification.