Giter Site home page Giter Site logo

miti's Issues

Governance Board Election Announcement!

It’s time for the MITI Governance Board to update its membership! The MITI Governance Board consists of 4-6 board members who will be responsible for holding regular board and community meetings, responding to community feedback on GitHub, and making group decisions about the maintenance of the MITI metadata standard and associated tools. Board membership is an 18-month term (January 2024-July 2025). We encourage all members of the community to apply.

If you are interested in serving on the Board, please submit your name and a brief statement about your interest in serving on the MITI board by 11:59PM ET on October 15, 2023 to this announcement. Voting will take place on GitHub after voting closes.

More structured schema format?

It seems that the YAML format for table data specification is informed by the choice of Cerberus as an underlying validator.

Is the consortium willing to consider Frictionless Data instead, as the underlying data schema specification framework? It has a lot of environment support, with Python, JS, and bash tools to make FD data packages natively available in various settings (local prototyping, headless/remote computation, web applications). As an example of a project that makes use of FD under the hood, see the C2M2 (Cross Cut Metadata Model).

If such a move (albeit quite significant) could be on the table, in a few days I can make a PR to share some scripts that I am currently using to:

  1. Automatically convert the MITI spec into (i) a flat table of fields, (ii) a table of tables, and (iii) a few separate files for the "valid values" currently spread across various YAML files.
  2. Automatically convert (i), (ii), and (iii) into a FD data package ready to get populated with data.

(The above (1) is not a perfect translation yet.)

Moving to FD offers the advantage that the general purpose validation functionality is high quality and maintained by someone else. A specific advantage is the possibility of foreign-key integrity checks made possible by the schema's awareness of dependencies between tables.

Also, I think maintaining the schema as a flat-table-of-fields behind the scenes would simplify the schema designers' work, as it relieves them of the need to hand-edit schema specification files whose syntax is really designed for the use case of machine reading.

I don't have any kind of association with FD. I'm in the Nadeem lab at MSKCC, and we're making software that would stand to benefit from the MITI standards.

The MITI spec is really comprehensive, and many groups would benefit from standardization in this domain! Thank you for your work in this important effort.

Add an automated test to validate YAML files

Add a GitHub Actions workflow that, for each file, verifies that:

  1. It can be loaded by a canonical YAML parser.
  2. Each entry contains description, type and significance fields.
  3. The type is specified as one of the pre-defined choices: boolean, integer, float, string, filename, date, doi, rrid.

Such workflow will help with automatic validation of future contributions.

Design a conditional structure for representing significance values

Whether a certain field is required may be conditioned on the value of other fields. Currently, these relationships are captured in plain text format, e.g.:

Object class Description:
  description: Free text description of object class
  type: string
  significance: If Object Class = 'other', required, otherwise recommended

Consider representing the relationships using a structured format, instead. This will enable ease of parsing for tools using MITI. For example, the above could be represented as

Object class Description:
  description: Free text description of object class
  type: string
  significance:
    required: Object class == 'other'

Another example:

PhysicalSizeZ:
  description: physical size of one pixel in z-dimension
  type: float
  significance:
    required: sizeZ > 1

where Object class and sizeZ refer to other existing fields at the top level.

Governance

Changes to MITI require a submission (via Github Issues- label enhancement) with the following information:

  • Scope and Field
  • Summary of changes
  • Example
  • Implementation as a pull request

The community can discuss and vote for the submission via Github for at least 30 days.

The governance board - Denis Schapiro (chair - Heidelberg), Sarah Arena (Harvard) Adam Taylor (Sage Bionetworks) - will decide based on the community vote/response to accept, ask for a revision or decline.

If the implementation needs a revision, this needs to be submitted latest 30 days after acceptance.

valid-values for "Lost to follow up" in 01-clinical.yaml should be quoted

Currently, valid-values for the item Lost to follow up use unquoted values:

Lost to follow up:
  description: Yes/No/Unknown indicator to identify whether a patient was lost to
    follow up.
  category: Follow-Up
  type: string
  valid-values:
  - Yes
  - No
  - Unknown

When loading using yaml.load (PyYAML==6.0), unquoted values for Yes/No are converted to True/False

>>> import yaml
>>> 
>>> yaml_str = """\
... valid-values:
...     - Yes
...     - No
...     - Unknown
... """
>>> loaded_data = yaml.load(yaml_str, Loader=yaml.FullLoader)
>>> 
>>> print(loaded_data)
{'valid-values': [True, False, 'Unknown']}

at:

Lost to follow up:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.