Populate per-asset dataStandard for BIDS, HED, NWB, and extensions#1811
Populate per-asset dataStandard for BIDS, HED, NWB, and extensions#1811yarikoptic wants to merge 9 commits intomasterfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1811 +/- ##
===========================================
- Coverage 76.27% 53.04% -23.23%
===========================================
Files 87 87
Lines 12484 12642 +158
===========================================
- Hits 9522 6706 -2816
- Misses 2962 5936 +2974
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| BIDS_DATASET_ERRORS = ("BIDS.MANDATORY_FILE_MISSING_PLACEHOLDER",) | ||
|
|
||
|
|
||
| def _add_standard( |
There was a problem hiding this comment.
line 542 of bases appears to have similar logic
| from .bases import ( | ||
| _SCHEMA_BAREASSET_HAS_DATASTANDARD, | ||
| GenericAsset, | ||
| LocalFileAsset, | ||
| NWBAsset, | ||
| ) |
Check notice
Code scanning / CodeQL
Cyclic import Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 15 days ago
To break the cycle with minimal behavioral change, keep only the class needed for immediate class definition (LocalFileAsset) as a top-level import, and lazily import the other .bases symbols (GenericAsset, NWBAsset, _SCHEMA_BAREASSET_HAS_DATASTANDARD) inside the methods/functions that use them.
In the shown snippet, we can safely reduce the top-level .bases import to only LocalFileAsset. This removes the cycle-triggering eager dependency footprint from module import time for three symbols. Then, in regions of dandi/files/bids.py (not shown here) where GenericAsset, NWBAsset, or _SCHEMA_BAREASSET_HAS_DATASTANDARD are referenced, add local imports in those functions/methods. For typing-only references, prefer from __future__ import annotations (already present) and/or if TYPE_CHECKING: imports.
Within the provided lines, the concrete edit is:
- Replace lines 21–27 with
from .bases import LocalFileAsset. - Replace the conditional block at lines 28–31 to avoid using
_SCHEMA_BAREASSET_HAS_DATASTANDARDat module import time (set a conservative default and load actual flag lazily where needed).
| @@ -18,17 +18,12 @@ | ||
| import dandi | ||
| from dandi.bids_validator_deno import bids_validate | ||
|
|
||
| from .bases import ( | ||
| _SCHEMA_BAREASSET_HAS_DATASTANDARD, | ||
| GenericAsset, | ||
| LocalFileAsset, | ||
| NWBAsset, | ||
| ) | ||
| from .bases import LocalFileAsset | ||
|
|
||
| if _SCHEMA_BAREASSET_HAS_DATASTANDARD: | ||
| from dandischema.models import hed_standard # type: ignore[attr-defined] | ||
| else: | ||
| hed_standard = None # type: ignore[assignment] | ||
| # Avoid eager access to .bases module state at import time to reduce cycle pressure. | ||
| # Resolve schema capability lazily at usage sites. | ||
| _SCHEMA_BAREASSET_HAS_DATASTANDARD = False | ||
| hed_standard = None # type: ignore[assignment] | ||
| from .zarr import ZarrAsset | ||
| from ..consts import ZARR_MIME_TYPE, dandiset_metadata_file | ||
| from ..metadata.core import add_common_metadata, prepare_metadata |
|
@yarikoptic lgtm! Let me just check on some of my own data |
|
Added NWB spec version extraction in 279c297.
Tested against dandiset 001636 (346 NWB files, macaque electrophysiology): Before: After: |
bendichter
left a comment
There was a problem hiding this comment.
ok, I made a minor correction. LGTM now!
- BIDSDatasetDescriptionAsset.get_metadata(): always sets BIDS standard (with BIDSVersion), adds HED when HEDVersion present in JSON, warns on read failure - NWBAsset.get_metadata(): sets NWB standard - ZarrBIDSAsset.get_metadata(): sets OME/NGFF for .ome.zarr assets - Guard for older dandischema without dataStandard on BareAsset; RuntimeError if dandischema >= 0.12.2 lacks it - Register ai_generated pytest marker in tox.ini Requires dandischema with per-asset dataStandard support (0.12.2+). Works silently with older dandischema. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add get_nwb_extensions() to pynwb_utils that reads h5file["specifications"] to discover ndx-* namespaces and their versions (filtering out core/hdmf) - NWBAsset.get_metadata() populates StandardsType.extensions with ndx-* extensions found in the NWB file - BIDSDatasetDescriptionAsset.get_metadata() extracts HED library schemas from list-valued HEDVersion (e.g. ["8.2.0", "sc:1.0.0"]) into extensions - Add tests for both NWB extension extraction and HED library schema parsing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…S_DATASTANDARD Single bool in bases.py guards all dataStandard/version/extensions features that ship together in dandischema >= 0.12.2, replacing scattered "field in Model.model_fields" checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move json, h5py, get_nwb_extensions, and _SCHEMA_BAREASSET_HAS_DATASTANDARD imports to module top level in tests. Keep pynwb_utils import deferred in bases.py (heavy transitive deps: h5py/pynwb/hdmf/numpy) per existing convention. Add import guidance to CLAUDE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
hed_standard does not exist in dandischema < 0.12.2, so gate the import on _SCHEMA_BAREASSET_HAS_DATASTANDARD (all new symbols ship together). HED detection is skipped when unavailable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move _SCHEMA_BAREASSET_HAS_DATASTANDARD guard block after all imports to fix E402. Use getattr/setattr for dataStandard access and type: ignore[call-arg] for StandardsType(version=...) to satisfy mypy against released dandischema that lacks these fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use the existing get_nwb_version() to populate the version field on the NWB StandardsType entry so per-asset metadata reports the NWB spec version (e.g. 2.9.0) alongside extensions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This enables CI to test against the unreleased dandischema that has the dataStandard field and StandardsType enhancements. TO BE REVERTED before merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
dataStandardfield (new in dandischema) for all standards:dataset_description.jsonwithBIDSVersionHEDVersionindataset_description.json, including library schemas (e.g."sc:1.0.0") asextensionsh5file["specifications"]group.ome.zarrassets in BIDS datasets_SCHEMA_BAREASSET_HAS_DATASTANDARDinbases.py— single bool gates all new dandischema features (dataStandard, version, extensions), with RuntimeError if dandischema >= 0.12.2 unexpectedly lacks the fieldget_nwb_extensions()topynwb_utils.py— reads HDF5specificationsgroup, filters core namespaces, returns{name: latest_version}for each extensionTest plan
TestBIDSDatasetDescriptionDataStandard— 5 tests covering BIDS always set, HED detected/absent, HED list form, HED library schemas as extensionstest_get_nwb_extensions— verifies ndx-* extraction with version sorting, core namespace filteringtest_get_nwb_extensions_no_specs— empty dict when no specifications groupPR in dandi-schema:
🤖 Generated with Claude Code