neurodatawithoutborders / nwbinspector Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 8.0 3.06 MB

Tool to help inspect NWB files for compliance with NWB Best Practices

Home Page: https://nwbinspector.readthedocs.io/

License: Other

Python 100.00%

nwb

nwbinspector's People

Stargazers

Watchers

Forkers

catalystneuro luiztauffer bendichter h-mayorquin yarikoptic tuanpham96 dysprague alejoe91

nwbinspector's Issues

[Feature]: conda-forge feedstock

What would you like to see added to the NWBInspector?

In light of nwbinspector becoming a dependency of dandi-cli in https://github.com/dandi/dandi-cli/, nwbinspector better becomes available for conda distribution within conda-forge channel so we would be able to continue having dandi client package there for the next release too.

@jwodder , since you have experience with creating feedstock recipes, would you be so kind to prepare one for nwbinspector?

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this feature was not already requested?

Check that large datasets are chunked

[Add Check]: validating video files under external_file attribute.

What would you like to see added to the NWBInspector?

Video files that are linked to nwbfiles under ImageSeries.external_file should be inspected for:

Checking their existence on disk with the path specified. (warning importance high). This check would defer for paths for local disk location vs urls.
Whether video files are of the common formats: .avi, .mp4, .mov, .wmv, .flv. This can be a mild recommendation (low importance)
if external_file is a local path, checking if its an absolute or relative to nwbfile location (low importance)

Pseudocode or other logic of the check

from pathlib import Path
import requests

video_file_path = "documents/folder0/video0.mp4"
video_file_url = "https://dandiarchive.s3.amazonaws.com/blobs/a52/253/a5225398-ec1c-4325-8bbe-83dc73fa8c87"

def video_exist_check(video_path, nwbfile_path):
   video_path_type = video_pathtype_check_absolute(video_path)
   if  video_path_type == "relative":
      pt = Path(nwbfile_path)/Path(video_path)
      return pt.exists()
   elif video_path_type == "url":
      return requests.head(video_path).ok
   else:
      return Path(video_path).exists()

def video_format_check(video_path):
   if Path(video_path).suffix in [".mp4", ".avi", ".wmv", ".mov", ".flv"]:
      return True
   else:
      return False

def video_pathtype_check_absolute(video_path):
   if os.path.isabs(video_path):
      return "absolute"
   elif any([str(video_path).startswith(i) for i in ["ftp://", "http://", "https://"]]):
      return "url"
   else:
      return "relative"

Do you have any interest in helping implement the check function?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this check was not already requested?

Check Subject Species is (a) specified and (b) follows some ontology

In advance of DANDI intention flag feature, need to add a check that subject species is specified. Without DANDI intention this is probably just a BEST_PRACTICE_SUGGESTION.

Also need to add a check that if subject species is specified, it is specified using binomial nomenclature (with proper capitalization as well). Without DANDI intention this is probably a BEST_PRACTICE_VIOLATION since they did take the time to fill it out. There are some special cases that might be OK, like something we've done before is if it's a special sub-species like 'Rattus norvegicus - Long Evans'. Or it looks like DANDI also supports URL's for taxonomies: https://github.com/dandi/dandi-cli/blob/5229292b777f644c5620bddf2abd4738168f0e81/dandi/metadata.py#L477-L486

But both of these will get elevated to CRITICAL when run with intention=DANDI_UPLOAD or DANDI_PUBLISH.

[Documentation]: Broken/missing link to check function

What would you like changed or added to the documentation and why?

On https://nwbinspector.readthedocs.io/en/dev/best_practices/ogen.html the link the check_optogenetic_stimulus_site_has_optogenetic_series() is missing/broke

Do you have any interest in helping write or edit the documentation?

No.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this change was not already requested?

check for negative spike times

Check that dimension lengths are aligned

e.g. number of elements in ElectricalSeries.electrodes should be the same as number of 2nd dim of ElectricalSeries.data, number of elements in TimeSeries.timestamps should be the same as 1st dim of TimeSeries.data

[Add Check]: Slashes in object names

What would you like to see added to the NWBInspector?

Check that the name field of all objects does not include slashes ('/' or '\').

Not really sure what the neurodata_type should be for this, though. Maybe have it act on anything that is a subclass of a NWBContainer, and then the logic of the test scans all the items inside that container?

Or just have it be a recursive function that starts at the NWBFile container level and then recurses down any objects that are themselves containers?

It's already in the Best Practices: towards the bottom of https://nwbinspector.readthedocs.io/en/dev/best_practices/naming.html#neurodata-types (which will get cleaned up in the next week)

Pseudocode or other logic of the check

No response

Do you have any interest in helping implement the check function?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this check was not already requested?

[Add Check]: Check that if experimenters string does not have comma

If a user provides a comma-separated scalar string for nwbfile.experimeter, the inspector should recommend that they instead store an array of experimenter names. Same for keywords and related publications.

[Documentation]: add documentation for display_messages_by_importance

nwbinspector/nwbinspector/inspector_tools.py

Line 61 in 262ef38

    
           def display_messages_by_importance(messages: List[InspectorMessage], indent_size: int = 2):

[Add Check]: entire col/row of table is not NaN, and entire axis of TimeSeries.data is not NaN

What would you like to see added to the NWBInspector?

We were in the process of making something like this inspired by the previous commented code, but was also discovered in a recent manual inspection.

Basically, an entire row or column of a DynamicTable, or an entire axis of a TimeSeries, should not all be NaN.

There are some exceptions that we have to observe such as x,y,z for ElectrodeTable. Basically any field that is required schematically but unknown in practice, we'd have to skip. But for axes optionally added or certainly custom fields, they should not be included if they are unknown.

Pseudocode or other logic of the check

all(np.isnan(getattr(obj, "data")))

Do you have any interest in helping implement the check function?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this check was not already requested?

add tests

[Minor performance]: Use Caching for CI environments

The most common usage of cache actions is to save the environment used to run tests so setup is not necessary.

This is always recommended to be done by hashing the current versions of the requirements.txt so you only do fresh installs when the versions change.

Will make the CI even faster, and reduce matrix-strategy computational waste (so we could really easily extend to 3.8 testing as well).

Rename master to dev

For consistency with the other NWB repos it would be nice to rename the master branch to dev

refactoring

Let's think about how to make this more extendable. How about this: We have a Check class, and this defines:

what neurodata type it is checking
the test
the severity

then nwbinspector simply loops through all neuodata objects, tests which checks apply to each object, and runs them.

NWBInspector would also have a method for adding or removing Checks

[Add Check]: ElectricalSeries.electrodes points to the electrodes table

What would you like to see added to the NWBInspector?

Also, RoiResponseSeries.rois points to a PlaneSegmentation table

Pseudocode or other logic of the check

No response

Do you have any interest in helping implement the check function?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this check was not already requested?

GH Action for codecov not working

The upload has the warning:

Warning: Unexpected input(s) 'yml', valid inputs are ['token', 'files', 'directory', 'flags', 'aws_curl_args', 'codecov_curl_args', 'commit_parent', 'env_vars', 'fail_ci_if_error', 'file', 'functionalities', 'gcov_args', 'gcov_executable', 'gcov_path_exclude', 'gcov_path_include', 'gcov_prefix', 'gcov_root_dir', 'move_coverage_to_trash', 'name', 'network_filter', 'override_branch', 'override_build', 'override_commit', 'override_pr', 'override_tag', 'path_to_write_report', 'root_dir', 'verbose', 'working-directory', 'xcode_derived_data', 'xcode_package']

https://github.com/codecov/codecov-action v1 has been deprecated. We should use v2 now. I think this is why https://app.codecov.io/gh/NeurodataWithoutBorders/nwbinspector is not showing data.

[Documentation]: Coordinate description for Allen CCF v3

What would you like changed or added to the documentation and why?

The current Best Practices (ported from the old ones): https://nwbinspector.readthedocs.io/en/dev/best_practices/ecephys.html#anatomical-coordinates

indicates that for Allen Institute CCF v3, (+x = posterior, +y = inferior, +z = right).

Is there a documentation I can link specifically to this for the x/y/z conventions? I'm currently only linking to the general CCF page there.
'right' seems subjective to a dorsal or ventral view, is there a better way we can phrase that?

Do you have any interest in helping write or edit the documentation?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this change was not already requested?

why check unique vals <= 4?

This line is cluttering up the inspect output:

nwbinspector/nwbinspector.py

Lines 111 to 112 in 7ceab62

    
           elif len(uniq) <= 4: 
        
               print("NOTE: '%s' %s data has only unique values %s" % (ts.name, type(ts).__name__, uniq))

I get why we are detecting a single unique value or two because it could be a boolean, but should we print anything for 3 or 4 unique values? What is the thinking behind this?

[Feature]: specifiable output format

What would you like to see added to the NWBInspector?

Right now you can save the report file with any suffix, but what is truly written is a .txt format with .rst sectioning. It would be nice to allow multiple formats to be saved such as .md, .doc, etc.

.md would be easy as the main difference is simply the section heading of #/##/### instead of =/-/~

For .doc, it is proposed to use https://pypi.org/project/rstdoc/ ; I've tried it a bit earlier today but didn't have any luck getting it to work through the CLI. It seems to use https://pypi.org/project/pandoc/ on the backend so maybe just tapping into that library would be the way to go

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this feature was not already requested?

By default just print/output to screen and not into a file

Description

During recent NWB-DANDI dev hackathon, during demonstration of the nwbinspector I forgot to ask -- why does it produce/save log into a file instead of just printing it to the screen (which then could be redirected to a file if desired, or there could be an option --output-file FILENAME to make it explicit)?

test for non-ascending "_time" columns of a `TimeIntervals` table

[Feature]: helper tool for running directly on a DANDISet ID #

What would you like to see added to the NWBInspector?

We've been running the inspector on DANDISets using the ros3 mode for a while now, but a lot of it is still manually specified code for resolving the s3 paths. It's pretty easy to do automatically though so might as well port over the code so we can just call a nice simple function directly each time.

Only concern is if it might produce circular dependencies when the dandi-cli merges the integration with the inspector.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this feature was not already requested?

[Bug]: CLI config error-catching

What happened?

Also observing a very bizarre behavior where if there are actual Errors (like a ValueError) triggered within one of the test functions (see PR #106 that fixes the ones I'm seeing), they aren't grabbed within the try/except of run_checks and included in the report, but instead are raised to console as true errors:

but if I don't include the --config dandi they behave as expected and go into that extra section of the final report.

Steps to Reproduce

From a console with working directory located in a folder that has dandiset 000004 (or ros3, alternatively)...


nwbinspector ./000004 --config dandi

and compare to...

nwbinspector ./000004

Traceback

Screenshot in the first section; particulars of the traceback don't really matter, it's the fact that it IS a traceback pushed to the console as the standard out instead of getting grabbed by the report.

Operating System

Windows

Python Executable

Python

Python Version

3.9

Package Versions

versions.txt

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this bug was not already reported?
To the best of your ability, have you ensured this is a bug within the code that checks the NWBFile, rather than a bug in the NWBFile reader (e.g., PyNWB or MatNWB)?

[Add Check]: cell and sample id

What would you like to see added to the NWBInspector?

@satra has requested checks for more complete metadata associated with intracellular electrophysiology patch clamp experiments, including checking for cell and sample ids.

Pseudocode or other logic of the check

No response

Do you have any interest in helping implement the check function?

Yes, but I would need guidance.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this check was not already requested?

Allow validator error messages to propagate

It only currently tells which files and how many files failed due to validations and other errors. Basically just copy the Exception traceback outside those try/excepts

yield inspector messages

I am working on a test for making sure that all time columns of a TimeIntervals object as ascending. This is a situation where a single object could throw multiple messages, one for each column. I think it would be useful to have the check functions yield the result, which would allow us to iterate over them.

[Discussion]: Add Error Codes?

As per breakout session discussion, it may be worth it to re-examine the question of associating explicit error codes to check functions.

Main point is this could (a) simplify skipping references, (b) allow us to change underlying function name but keep error code the same.

Provide "structured" machine readable output/return

A companion to #76 but largely orthogonal.

Although we have not standardized in dandi-cli much either, and having looked into nwbinspector insides, but keeping in mind dandi/dandi-cli#924, i.e. integration with dandi-cli, I think it would be great if internally all the "hints/warnings/whatever" were represented as some basic data structure (well -- dict) so from CLI I could request just a dump of those hints in machine readable form (e.g. JSON). Default "rendered" of the output could format them as it does now. And then dandi-cli code could use nwbinspector as a library and just get a list of those records for the path(s) it inquire about.

To some degree such an idea of "alternative output formats" could be seen in dandi ls --format option which provides a range of those. Also, datalad inspired folks could use -f with any datalad command ;)

Add flag for DANDI intended upload

Optional flag or config for DANDI alteration of importance levels of certain checks.

Idea from Dorota: warn the user that output report may appear different.

[Add Check]: ophys checks to add

image_mask dimensions do not match dimensions of associated ImageSeries
wavelength is not in nm

Cannot import extension module prior to reading NWB file

Some NWB files with extensions cannot be loaded purely with NWBHDF5IO(path, load_namespaces=True). They require the API classes to be loaded first, e.g., import ndx_franklab_novela.

[Feature]: parallelize over NWB files

What would you like to see added to the NWBInspector?

the -j --njobs flag should control number of simultaneous jobs that run inspector over NWB files in parallel

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this feature was not already requested?

Check for DynamicTableRegion values that are invalid

problem with compression test

This is showing up for many datasets:

BEST PRACTICE VIOLATION
-----------------------
1.1.1   NWBFile 'root' located in '/'
        check_dataset_compression: Consider enabling compression when writing a large dataset.

It's not clear what dataset is causing the violation.

[Add Check]: check session start time

Should probably check the session start time has been set properly within a file; not just as a datetime object, but as a valid date range for performing neurophysiology studies. E.g., if session_start_time <= datetime(1970, 1, 1) print "Session start time may not be set to the true date of the recording."

Check Subject Age is (a) specified and (b) ISO8601

In advance of DANDI intention flag feature, need to add a check that subject ID is specified. Without DANDI intention this is probably just a BEST_PRACTICE_SUGGESTION.

Also need to add a check that if subject ID is specified, it is specified in ISO8601. Without DANDI intention this is probably a BEST_PRACTICE_VIOLATION since they did take the time to fill it out.

But both of these will get elevated to CRITICAL when run with intention=DANDI_UPLOAD or DANDI_PUBLISH.

[Feature]: an auto-fixer for missing metadata

What would you like to see added to the NWBInspector?

An idea that came up during a recent meeting with some BIDS folk would be to have a nice user-friendly way of automatically fixing common issues that come up during a run of the NWBInspector.

Clearly, some types of Best Practice violations, such as wrong data orientation, no data compression/chunking, etc. would require a complete reconversion of the data, so are not applicable here.

Likewise, the overlay/sidecar project Ryan is working on would be required for editing the form of any metadata violations (like ISO 8601, subject naming conventions, etc.) for fields that had already been written to the file.

There is, however, one case where this sort of feature could be quite useful: missing metadata

The feature would be as follows:

[Optional] A fully interactive CLI would ask the user to explicitly type the values for each of the missing metadata fields that the NWB Best Practices suggests including.
At the end of the interactive session (or written manually), a metadata .yml file (with structure very similar to how we do this in the NWBCT) would be saved and the user would be asked to review it one more time and confirm the values within it are accurate.
With one more CLI call, the user can run the auto-fixer on the saved .yml file, which would result in all the missing metadata from the .yml being written to the NWBFile(s) in append mode, and deleting the source .yml file to keep things clean (only if no errors occurred during the process). Basically the idea is to 'fold' the .yml content into the NWBFile.
[Future] When sidecars are available, this process can harness them to cover not just missing metadata, but also existing text fields that need to be updated.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this feature was not already requested?

Add "config file" support

not yet sure about all "features" but such config file should support but it must allow to specify

which checks for which paths to ignore

This is largely inspired by config file in bids-validator, see https://github.com/bids-standard/bids-validator#configuration

It is needed since some of such ignores are "per dataset" and thus it should be possible to specify what to ignore in a specific dataset (or dandiset) in DANDI land, instead of relying on user to know what command line options to provide.

[Documentation]: Update issue/PR labels to follow the organization default lables

What would you like changed or added to the documentation and why?

https://github.com/organizations/NeurodataWithoutBorders/settings/repository-defaults

Do you have any interest in helping write or edit the documentation?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this change was not already requested?

[Feature]: per dataset configuration file to skip/ignore false positives

What would you like to see added to the NWBInspector?

Originally requested in #37 . I think that the implementation in #86 went on a different "tangent" from what my bids-validator-rotten mind had in mind: e.g. given the "stock" (provided by nwbinspector) dandi configuration I want to disable some tests/checks, e.g. reacting to some benign check_data_orientation as given in the example of dandi/handbook#40 .

Then I would have created some .nwbinspector.cfg or alike which would have instructed to ignore such errors if nwbinspector was ran in this folder.

Also #86 has TODO: expose to command line -- was that done?

Do you have any interest in helping implement the feature?

No.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this feature was not already requested?

[Add Check]: check for spike times during unobserved intervals

Add badges to README

Just need to add the little badge displays for version, tests, coverage, etc.

[New Check]: Check Subject age or dob

What would you like to see added to the NWBInspector?

I'm getting this:

1.1.2   Subject 'subject' located in ''
        check_subject_age: Subject is missing age.

even if the Subject has date_of_birth defined.
I think check_subject_age should be changed to check_subject_age_or_dob

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this feature was not already requested?

[Add Check]: unknown or default start times

What type of check would you like to see added to the NWBInspector?

At times in the past, if a starting_time for some TimeSeries was unknown we set it to np.nan to indicate this fact. I just wanted to raise the question here officially to see if this is indeed encouraged or if the default behavior makes sense.

It's not stated in the best practices, and default behavior seems to assume that it's always 0.0, but especially in cases where multiple data streams are incoming to the NWBFile this may not be correct due to synchronizing offsets.

Pseudocode or other logic of the check

No response

Do you have any interest in helping implement the check function?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this check was not already requested?

[Documentation]: MatNWB UUID alternative?

What would you like changed or added to the documentation and why?

@bendichter or @lawrence-mbf In the Best Practices for the NWBFile identifier, I instruct PyNWB users towards the UUID project. What is the supported analog in MatNWB and do you have a link to a tutorial or other instructions that I can include?

Do you have any interest in helping write or edit the documentation?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this change was not already requested?

Check Subject ID

In advance of DANDI intention flag feature, need to add a check that subject ID is specified. Without DANDI intention this is probably just a BEST_PRACTICE_SUGGESTION, but will get elevated to CRITICAL when run with intention=DANDI_UPLOAD or DANDI_PUBLISH.

Check for unordered timestamps

Do we need both license and Legal?

@oruebel @bendichter

Two files in repo

https://github.com/NeurodataWithoutBorders/nwbinspector/blob/dev/Legal.txt

and

https://github.com/NeurodataWithoutBorders/nwbinspector/blob/dev/license.txt

do we need them both?

[Minor performance]: organization and main loop iterations

Testing out this new button on GitHub - when you go into the ... options in a GitHub discussion and select 'raise in new issue'.

Anyway, this is super low priority ATM, but leaving here for future reference. Especially now we have parallelization functioning well, performance per job is less of an issue.

An analogous approach can also be made to improve the main iteration loops for the NWBFile objects vs. applicable neurodata_types in inspect_nwb.

From Ben's original posting:

how about this. I create a objects_by_attr dict so that the objects are only looped over once per recursion

from collections import defaultdict

def hier_org(objects, levels, reverse=None):
    if reverse is None:
        reverse = [False] * len(levels)
    unique_attrs = sorted(set(getattr(object, levels[0]) for object in objects))
    if reverse[0]:
        unique_attrs = unique_attrs[::-1]
    objects_by_attr = defaultdict(list)
    [objects_by_attr[getattr(obj, levels[0])].append(obj) for obj in objects]
    if len(levels) > 1:
        return {attr: hier_org(objects_by_attr[attr], levels[1:], reverse[1:]) for attr in unique_attrs}
    else:
        return {attr: objects_by_attr[attr][0] for attr in unique_attrs}
    
    

# test
from collections import namedtuple

Obj = namedtuple("Obj", "a b c")

objects = []
for a in range(3):
    for b in range(3):
        for c in range(3):
            objects.append(Obj(a=a, b=b, c=c))
            
hier_org(objects, ["a", "b", "c"], [True, False, True])

Originally posted by @bendichter in #89 (comment)

	elif len(uniq) <= 4:
	print("NOTE: '%s' %s data has only unique values %s" % (ts.name, type(ts).__name__, uniq))

neurodatawithoutborders / nwbinspector Goto Github PK

nwbinspector's People

Stargazers

Watchers

Forkers

nwbinspector's Issues

What would you like to see added to the NWBInspector?

Do you have any interest in helping implement the feature?

Code of Conduct

What would you like to see added to the NWBInspector?

Pseudocode or other logic of the check

Do you have any interest in helping implement the check function?

Code of Conduct

What would you like changed or added to the documentation and why?

Do you have any interest in helping write or edit the documentation?

Code of Conduct

What would you like to see added to the NWBInspector?

Pseudocode or other logic of the check

Do you have any interest in helping implement the check function?

Code of Conduct

What would you like to see added to the NWBInspector?

Pseudocode or other logic of the check

Do you have any interest in helping implement the check function?

Code of Conduct

What would you like to see added to the NWBInspector?

Pseudocode or other logic of the check

Do you have any interest in helping implement the check function?

Code of Conduct

What would you like changed or added to the documentation and why?

Do you have any interest in helping write or edit the documentation?

Code of Conduct

What would you like to see added to the NWBInspector?

Do you have any interest in helping implement the feature?

Code of Conduct

Description

What would you like to see added to the NWBInspector?

Do you have any interest in helping implement the feature?

Code of Conduct

What happened?

Steps to Reproduce

Traceback

Operating System

Python Executable

Python Version

Package Versions

Code of Conduct

What would you like to see added to the NWBInspector?

Pseudocode or other logic of the check

Do you have any interest in helping implement the check function?

Code of Conduct

What would you like to see added to the NWBInspector?

Do you have any interest in helping implement the feature?

Code of Conduct

What would you like to see added to the NWBInspector?

Do you have any interest in helping implement the feature?

Code of Conduct

What would you like changed or added to the documentation and why?

Do you have any interest in helping write or edit the documentation?

Code of Conduct

What would you like to see added to the NWBInspector?

Do you have any interest in helping implement the feature?

Code of Conduct

What would you like to see added to the NWBInspector?

Do you have any interest in helping implement the feature?

Code of Conduct

What type of check would you like to see added to the NWBInspector?

Pseudocode or other logic of the check

Do you have any interest in helping implement the check function?

Code of Conduct

What would you like changed or added to the documentation and why?

Do you have any interest in helping write or edit the documentation?

Code of Conduct

Recommend Projects

Recommend Topics

Recommend Org