Add check for empty datasets in NWB containers#584
Add check for empty datasets in NWB containers#584anoushkajain wants to merge 3 commits intoNeurodataWithoutBorders:devfrom
Conversation
stephprince
left a comment
There was a problem hiding this comment.
Thanks for the PR @anoushkajain!
I think this looks good but there are a couple more pieces to add before merging.
- We may need to expand the check to other array-like dataset objects that could be present in NWB files that haven’t been written to disk yet. I've added a suggestion to the code related to this.
- Can you add tests including passing and failing cases with the different types of datasets.
- Can you add a new best practice section to the documentation recommending that objects with empty datasets should not be included in the file.
Also in the future, feel free to @ me or request a review whenever you are ready for a PR to be looked at!
| Inspector messages for each empty dataset found, or None if no empty datasets are found. | ||
| """ | ||
| for field_name, field in getattr(nwb_container, "fields", dict()).items(): | ||
| if not isinstance(field, (h5py.Dataset, zarr.Array)): |
There was a problem hiding this comment.
I think we may also want to check for other empty array-like objects here, including numpy arrays and DataIO objects. Could use something like:
| if not isinstance(field, (h5py.Dataset, zarr.Array)): | |
| is_array_data = hasattr(value, "shape") and hasattr(value, "dtype") | |
| if not is_array_data: |
| continue | ||
|
|
||
| # Check if the dataset has zero elements | ||
| if field.size == 0: |
There was a problem hiding this comment.
I am not sure if it will be necessary but for the DataIO objects you may need to use get_data_shape from the nwbinspector.utils module.
| return None | ||
|
|
||
|
|
||
| @register_check(importance=Importance.CRITICAL, neurodata_type=NWBContainer) |
There was a problem hiding this comment.
I am not sure yet whether this should be a critical check or a best practice violation. There was a discussion here about whether empty datasets should be allowed: NeurodataWithoutBorders/pynwb#2065.
Fix #324