Giter Site home page Giter Site logo

Comments (15)

Remi-Gau avatar Remi-Gau commented on June 15, 2024 2

@yarikoptic if you have time, would you mind updating the top message so we can know what aspects of this issues have NOT yet been addressed or clarified sufficiently by #946

from bids-specification.

Remi-Gau avatar Remi-Gau commented on June 15, 2024 1

wow -- it is very popular in back references, but nobody voiced their opinions. @tsalo @sappelhoff @Remi-Gau @effigies WDYT?

This is the can of worm issue. We know it's there and that we are going to have to open it at some points but nobody likes to eat worm and not just because some of us are vegetarians.

from bids-specification.

yarikoptic avatar yarikoptic commented on June 15, 2024 1

I will @Remi-Gau ... I guess feel welcome to close if I don't get back to it let's say by Sep 1 (hopefully before ;)).

from bids-specification.

yarikoptic avatar yarikoptic commented on June 15, 2024

and only now realized that we do not "officially/clearly" have a way to define the most common .json for inheritance, which might generalize for ALL other files.
Well -- dataset_description.json is kinda like that file -- since I guess all information pertinent to the dataset as a whole is pertinent to individual files as well. But then should it be considered while constructing the full metadata record for a particular leaf file?

from bids-specification.

yarikoptic avatar yarikoptic commented on June 15, 2024

Thinking about it even more, I still like my algorithmic approach (with a single caveat outlined in the bottom of the original post, still thinking about it) to the definition and would like to propose to remove that restriction to have only a single "applicable" file at any given level.

  1. as stated above, the most common file to absorb all "common" fields is actually already there and it is dataset_description.json. Whatever applies to the entire dataset does apply to each file by default IMHO.
  2. I might want to have ses-X.json on top level to provide specific common fields for any subject/modality for ses-X in the study (e.g. that a specific scanner/software was used - a real use case from today!), which was different from another scanner in ses-Y.
  3. And then I would like to have task-A_bold.json and task-B_bold.json which would have other common fields specific to each task so I could have a quick overview of differences between tasks.
    I could furthermore provide specifications for different acq- etc, pretty much may be even per each _key- if I am that good. That makes it possible to scale nicely for large(r) studies and provide a nice summary of differences on top of the dataset.

In above example, all 3 specific types of .json files (or at least two, if we exclude dataset_description.json) at the same top level should then be used while "inheriting".
Instead of the rule for "a single applicable file" there could be a rule (with a validator check) that files at any given level must not be in conflict, in that they must not re-define the same field (i.e. a field F to be defined both in task-_bold.json and ses-.json files) thus making it ambiguous since would depend on the order on how "inheritance" is done across those files at the same level. If there is no overlap in fields, no ambiguity -- no need for a restriction.

from bids-specification.

yarikoptic avatar yarikoptic commented on June 15, 2024

Continuing talking to myself, here is a list of non task- files in the top level to be inherited:

$> ls ds00*/*json | grep -v -e dataset_description -e participants -e task- | xargs -n 1 basename | sort | uniq -c                                                                
      1 acq-epi_T1w.json
      1 acq-flipangle05_run-01_MEFLASH.json
      1 acq-flipangle30_run-01_MEFLASH.json
      1 acq-moldOFF_T1w.json
      1 acq-moldON_T1w.json
      1 acq-mprage_T1w.json
      1 bold.json
      1 dir-0_epi.json
      1 dir-1_epi.json
      1 dir-AP_epi.json
      1 dir-PA_epi.json
      3 dwi.json
      1 inplaneT2.json
      3 phasediff.json
      1 run-1_echo-1_FLASH.json
      1 run-1_echo-2_FLASH.json
      1 run-1_echo-3_FLASH.json
      1 run-1_echo-4_FLASH.json
      1 run-1_echo-5_FLASH.json
      1 run-1_echo-6_FLASH.json
      1 run-1_echo-7_FLASH.json
      1 run-2_echo-1_FLASH.json
      1 run-2_echo-2_FLASH.json
      1 run-2_echo-3_FLASH.json
      1 run-2_echo-4_FLASH.json
      1 run-2_echo-5_FLASH.json
      1 run-2_echo-6_FLASH.json
      1 run-2_echo-7_FLASH.json
     24 T1w.json
      2 T2w.json

attn @tyarkoni (happen you didn't spot my whining here before)

from bids-specification.

yarikoptic avatar yarikoptic commented on June 15, 2024

wow -- it is very popular in back references, but nobody voiced their opinions. @tsalo @sappelhoff @Remi-Gau @effigies WDYT?

from bids-specification.

effigies avatar effigies commented on June 15, 2024

TBH I'm not sure I understand the issue. It's just that we should be able to have multiple applicable JSON files at the same level?

I remember when we made that rule, and it was because the precedence within a level is ambiguous, so overrides could be implementation- or even RNG-dependent.

Given that there is some appetite for removing inheritance altogether in BIDS 2.0, I can't imagine the people who feel that way being pleased with it getting even more byzantine.

from bids-specification.

VisLab avatar VisLab commented on June 15, 2024

HED (Hierarchical Event Descriptors) relies heavily on inheritance. Specifically the _events.json files can be inherited. This is crucial for annotation, as it allows a user to provide a single file at the top level containing the event annotations for the entire dataset.

Having a single _events.json sidecar at the top level is recommended process for annotation. To force users to make copies of the annotation at lower portions of the dataset means that users of the data would have no idea whether the same annotations were being used across datasets.

HED uses the rule that _event.json files can be placed at any level. A duplicated key at a lower level overrides the contents at a key at a higher level. We haven't considered that there might be multiple applicable _events.json sidecars at the same level.

HED really needs to have inheritance, but we only need one applicable sidecar at a given level. @sappelhoff @dungscout96 @smakeig

from bids-specification.

TheChymera avatar TheChymera commented on June 15, 2024

I would just add:

  • If we are to keep inheritance, encoding it in the schema somehow might be a good idea, as currently any validator implementation would need to hard-code both when and how it applies.
  • Perhaps it is worth considering whether we could drop the inheritance principle in the future. The amount of text duplication it prevents is trivial in terms of size on disk, and it makes datasets both more monolithic and less transparent --- particularly for manual inspection.

from bids-specification.

VisLab avatar VisLab commented on June 15, 2024

from bids-specification.

yarikoptic avatar yarikoptic commented on June 15, 2024

Dear @VisLab , I am 100% with you on this! Please (attn @TheChymera as well) follow up on the dedicated issue for that at bids-standard/bids-2-devel#36. Here we are not getting rid of Inheritance principle but working toward "fixing it"! ;)


Although I see the desire expressed by @TheChymera as "encoding it in the schema somehow", I would argue that "there is no need" (if not impossible).

Rationale: The entire src/schema can be loaded only following an "algorithm" which is coded up in tools/schemacode. schema/README provides description of the related data structures and some parts of the algorithm with that. Using that description in README.md and implementation in dandischematools as "canonical" other tools could provide implementations to load the schema. So we already have a duality of "schema" and "algorithm".

Situation is likely to stay the same for Inheritance principle. And IMHO it is ok to just express it as an algorithm (original desire) or a clear set of rules (current formulation) which gets coded up in dandischematools. I think current formulation in common principles: The Ineritance Principle (let's call it IP here) got much better through the work by @Lestropie in #946. There IP is expressed as a set of rules, and they can be expressed as an algorithm (multiple versions of it really, let's not get there here ;) ) and coded up. In that PR we have touched on the aspect the IP is not yet covering and which I have mentioned in my original description. We agreed with @Lestropie in the PR #946 that it needs to be done post that PR, but AFAIK it was not yet addressed. That should not prevent us though from implementing current version (I filed #1181) of the IP and then see where it is not yet complete or fails empirically.

But as current IP was cleared up in #946 I think it did address most of my original questions -- I think we can let this issue RiP, and instead file more specific issues or PRs to make "inheritance principle better" (hopefully without introducing backward incompatible changes). FTR here is a summary of issues/questions I had in mind and which might have been "addressed" or not (separate issues):

  • comment above: having both ses-X_bold.json (for common to ses-X across all subjects metadata) and task-Y_bold.json at the same top level. Following 80/20 principle, let's not bother about them. Current rule IP.4 "partially" forbids it. "Partially" because I could have ses-1_bold.json and sub-{1,2,...}/task-Y_bold.json to overcome in some ugly way. But I do consider such combinatorics well within those 20% to not bother about
  • Original description had a case of having both task-task1_acq-X_bold.json and task-task1_bold.json -- I filed a dedicated #1182 to address it one way or another;
  • my ideas throwing about inheriting from datasets_description.json (and probably others like participants.tsvetc) are addressed by the restriction that we must retain the same _suffix. Higher level tools would "bind" participant etc metadata with metadata of specific data files.

from bids-specification.

TheChymera avatar TheChymera commented on June 15, 2024

@yarikoptic thanks for pointing out the main issue.

@VisLab I agree that it is more convenient when taking a monolithic approach to the dataset, though I disagree that:

Now if a change needs to be made [...] It really is a mess.
Tools can be easily written to summarize information if needed.

I would argue the reverse, that changes (infrequent) can easily be made across low-level files with ubiquitous tools such as find, grep, and sed, and information integration across inheritance levels (frequent) is non-trivial (e.g. #1182 ) and constitutes another nonstandard extraneous library, which may be handled differently in different packages.

In any case for the time being inheritance will not be dropped due to backwards-compatibility, and perhaps better discussed in the issue linked by @yarikoptic above.

from bids-specification.

Lestropie avatar Lestropie commented on June 15, 2024

Much of the earlier text posted in this issue relates to my follow-up proposal to #946, being #1003. Sorry I didn't find this thread and link to it.

IMO it should be both possible and recommended to define at higher levels of the filesystem hierarchy any metadata that are common to many files at lower levels of the hierarchy based on a set of common entities. This is a natural recapitulation of the hierarchical nature of the metadata. Duplicating such data, whether electively or through obligation due to removal of the inheritance principle, would mean that any application that depends on such information may need to check for consistency of those data across sidecar files, which contra-indicates such a design.

I spent a fair bit of time devising the language of #1003 as a way of systematizing the definition of how sidecar files can be present both within and across filesystem levels, and their relative ordering is wholly unambiguous, without a need to access the contents of those files. I tried to do it in a way that was both comrehensible for users, and sufficiently precise to guide software implementations. This was a natural extension of the existing IP, and was adequate for my own use case (though see bids-standard/bids-bep016#50 for the latest on that topic).

@yarikoptic may actually be looking for something even more general & powerful here. As shown in Example 2 in the current iteration of #1003, there are some circumstances in which having multiple sidecar files in a single filesystem hierarchy level is still impermissible: even though both possess a subset of the entities in the data file of interest, there is no objective way to decide the order in which they should be loaded, and therefore, if there is a metadata field that is present in both files, the content of that metadata field as applied to the data file of interest is ambiguous.
Theoretically, one could generalise the IP even further, permitting such "parallel" inheritance only if there are no metadata fields common to both files. This would introduce even further complexity to both the IP and the validator; but as long as the validator is capable of checking such exhaustively, software applications would not strictly need to check for such.
Given the amount of resistance I've had to #1003, I highly doubt the chances of such a proposal getting through; but I wanted to elucidate here fully given the relevance. If anything I think this would actually be more intuitive for users willing to exploit this level of complexity, as they could simply define metadata fields in sidecar files at the correct level with the correct set of entities that define their relevance. But complexity of both understanding and implementation is not something to commit to without very careful thought.

from bids-specification.

yarikoptic avatar yarikoptic commented on June 15, 2024

Theoretically, one could generalise the IP even further, permitting such "parallel" inheritance only if there are no metadata fields common to both files. This would introduce even further complexity to both the IP and the validator; but as long as the validator is capable of checking such exhaustively, software applications would not strictly need to check for such.

;-) "great mind think alike" (using it 2nd time this Monday, a good sign!!) -- I have left https://github.com/bids-standard/bids-specification/pull/1003/files#r940247739 before reading this reply (due to the ordering of my mailbox ;-) ). Yes, it would add more complexity to validator and somewhat to implementation of inheritances principle, but IMHO worthwhile.

from bids-specification.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.