Bug: cannot query metadata with unsupported annotations by kbolashev · Pull Request #653 · DagsHub/client

kbolashev · 2026-02-15T14:31:42Z

This PR addresses two bugs:

Bug 1: unsupported metadata annotations break `ds.all()`, if they were also tagged as document.

Reproduction:

Create datasource and upload metadata that is not currently supported by the converter (e.g. audio metadata)
Make sure that the metadata has the DOCUMENT tag on it (will happen automatically right now on DagsHub)
Try to do ds.all()

Expected:
The warning shows up, but otherwise the program continues execution.

Actual:
Following stack trace, that happens due to trying to read the annotation content a second time:

  File "/Users/kirillbolashev/Job/GitRepos/client-test-suite/manual_tests/bugs/querying/autoloading_bugs.py", line 10, in load_audio_dataset
    qr = ds.all()
  File "/Users/kirillbolashev/Job/GitRepos/client/dagshub/data_engine/model/datasource.py", line 337, in all
    return ds.fetch(load_documents=load_documents, load_annotations=load_annotations)
           ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kirillbolashev/Job/GitRepos/client/dagshub/data_engine/model/datasource.py", line 289, in fetch
    res._load_autoload_fields(documents=load_documents, annotations=load_annotations)
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kirillbolashev/Job/GitRepos/client/dagshub/data_engine/model/query_result.py", line 1105, in _load_autoload_fields
    self.get_annotations()
    ~~~~~~~~~~~~~~~~~~~~^^
  File "/Users/kirillbolashev/Job/GitRepos/client/dagshub/data_engine/model/query_result.py", line 679, in get_annotations
    self.get_blob_fields(*self.annotation_fields, **kwargs)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kirillbolashev/Job/GitRepos/client/dagshub/data_engine/model/query_result.py", line 418, in get_blob_fields
    self._convert_annotation_fields(*fields, load_into_memory=load_into_memory)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kirillbolashev/Job/GitRepos/client/dagshub/data_engine/model/query_result.py", line 451, in _convert_annotation_fields
    dp.metadata[fld] = Path(dp.metadata[fld]).read_bytes()
                       ~~~~^^^^^^^^^^^^^^^^^^
  File "/Users/kirillbolashev/.pyenv/versions/3.13.8/lib/python3.13/pathlib/_local.py", line 503, in __init__
    super().__init__(*args)
    ~~~~~~~~~~~~~~~~^^^^^^^
  File "/Users/kirillbolashev/.pyenv/versions/3.13.8/lib/python3.13/pathlib/_local.py", line 132, in __init__
    raise TypeError(
    ...<2 lines>...
        f"not {type(path).__name__!r}")
TypeError: argument should be a str or an os.PathLike object where __fspath__ returns a str, not 'bytes'

Solution:

Created a new subclass of MetadataAnnotations named UnsupportedMetadataAnnotations, that prevents interacting with the annotations, but still allows to get the value via .value and .to_ls_task(), making it so qr.get_annotations() acts consistently.
No matter if an annotation is valid or not, all metadata values of this field will be of MetadataAnnotations

This makes it so that trying to load the annotation a second time will fall through due to this line:

client/dagshub/data_engine/model/query_result.py

Line 451 in 511801c

if isinstance(dp.metadata[fld], MetadataAnnotations):

This bug has a reproduction test added, that fails on current main, and is fixed in this PR

Bug 2: Possible to load an annotation as a document a second time, despite the annotation already having been parsed

I couldn't get an actual reproduction for it, aside from the stack trace that was throwing in a similar way, but in a function get_annotations, leading to get_blob_fields:

client/dagshub/data_engine/model/query_result.py

Line 430 in 4e254d4

dp.metadata[fld] = Path(dp.metadata[fld]).read_bytes()

TypeError: expected str, bytes or os.PathLike object, not MetadataAnnotations`

Solution:

Added a safety guard against it at this line, and added a test that loads the annotations a second time.

Implementation Details

Core Components

NotImplementedMeta Metaclass (dagshub/data_engine/util/not_implemented.py)
A new metaclass that automatically creates stubs for inherited methods and properties. For each attribute not overridden in a subclass, it either injects a property that raises NotImplementedError on access (for properties) or a method that raises NotImplementedError when called (for callables). This enables subclasses to explicitly indicate unsupported functionality while maintaining a consistent interface.

UnsupportedMetadataAnnotations Class (dagshub/data_engine/annotation/metadata.py)
A new subclass of MetadataAnnotations using NotImplementedMeta as its metaclass. It wraps unsupported annotation values (e.g., audio metadata) and exposes them through value and to_ls_task() properties without attempting full metadata annotation processing. This allows consistent handling of unsupported annotations across the codebase while preventing type errors when accessing unsupported features.

Query Processing Improvements (`dagshub/data_engine/model/query_result.py`)

The blob and annotation loading logic now includes safety guards:

Skip re-loading: When processing fields that are already MetadataAnnotations instances, the loader skips further processing to prevent double-loading errors
Graceful fallback: When annotation validation fails, the original value is wrapped in UnsupportedMetadataAnnotations instead of raising an exception, ensuring downstream code can still access the raw annotation data
Conditional memory loading: Content is force-loaded into memory when load_into_memory is False or when the existing field is a Path, then decoded to UTF-8

Test Coverage

Tests now exercise the critical double-loading scenario with a new fixture (ds_with_unsupported_annotation) and explicit multi-load verification. Test mocks were enhanced to support both bytes and PathLike return types, allowing more realistic blob retrieval simulation. Audio annotation test data (audio_annotation.json) provides a concrete example of unsupported metadata for testing purposes.

dagshub · 2026-02-15T14:31:46Z

Join the discussion on DagsHub!

coderabbitai · 2026-02-15T14:31:57Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This change introduces support for handling unsupported metadata annotations by adding a new UnsupportedMetadataAnnotations class backed by a NotImplementedMeta metaclass. The annotation loading logic is updated to gracefully wrap invalid annotations instead of failing, and tests are refactored to verify this behavior.

Changes

Cohort / File(s)	Summary
Core Metaclass Support `dagshub/data_engine/util/not_implemented.py`	New `NotImplementedMeta` metaclass that injects stubs raising `NotImplementedError` for inherited methods and properties not explicitly overridden in subclasses.
Metadata Annotation Handling `dagshub/data_engine/annotation/metadata.py`	New `UnsupportedMetadataAnnotations` class extending `MetadataAnnotations` with `NotImplementedMeta` metaclass; stores and exposes original_value without full metadata annotation support.
Query Result Processing `dagshub/data_engine/model/query_result.py`	Updated blob loading and annotation conversion logic to skip already-loaded `MetadataAnnotations`, conditionally force-load content into memory, and wrap validation errors in `UnsupportedMetadataAnnotations` instead of failing.
Test Infrastructure `tests/data_engine/annotation_import/test_annotation_parsing.py`	Refactored fixtures with `_ds_with_annotation` helper; added `ds_with_unsupported_annotation` fixture; updated `mock_get_blob` to return `Union[bytes, PathLike]`; added imports for `QueryResult`, `UnsupportedMetadataAnnotations`, and type hints.
Test Fixture Data `tests/data_engine/annotation_import/res/audio_annotation.json`	New JSON test fixture containing complete audio annotation payload with metadata and result items.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bug/document-annotation-fields

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

qodo-code-review · 2026-02-15T14:32:00Z

ⓘ You are approaching your monthly quota for Qodo. Upgrade your plan

Review Summary by Qodo

Fix unsupported annotations breaking queries and double-loading

🐞 Bug fix

Walkthroughs

Description

• Handle unsupported metadata annotations gracefully without breaking queries
• Create UnsupportedMetadataAnnotations subclass to prevent double-parsing errors
• Add defensive checks to prevent re-processing already-converted annotation fields
• Improve error messaging for unsupported or invalid annotations

Diagram

flowchart LR
  A["Unsupported Annotation"] --> B["UnsupportedMetadataAnnotations"]
  B --> C["Prevents Double-Parsing"]
  C --> D["Query Succeeds"]
  E["Defensive Checks"] --> F["Skip MetadataAnnotations"]
  F --> D

File Changes

1. dagshub/data_engine/annotation/metadata.py ✨ Enhancement +29/-7

Create UnsupportedMetadataAnnotations for graceful handling

• Added UnsupportedMetadataAnnotations subclass using NotImplementedMeta metaclass
• Stores original bytes value for unsupported annotations
• Implements value property and to_ls_task() method to return original content
• Prevents interaction with unsupported annotations while preserving data access

dagshub/data_engine/annotation/metadata.py

2. dagshub/data_engine/model/query_result.py 🐞 Bug fix +14/-6

Add defensive checks and unsupported annotation handling

• Added import for UnsupportedMetadataAnnotations
• Added defensive check in _get_blob_fn to skip MetadataAnnotations instances
• Added check in _convert_annotation_fields to skip already-loaded annotations
• Wrap ValidationError exceptions by creating UnsupportedMetadataAnnotations instead
• Improved warning message to mention "unsupported or invalid annotations"

dagshub/data_engine/model/query_result.py

3. dagshub/data_engine/util/not_implemented.py ✨ Enhancement +48/-0

Implement NotImplementedMeta metaclass for method stubbing

• New metaclass NotImplementedMeta that automatically raises NotImplementedError
• Replaces all non-overridden parent methods and properties with error-raising stubs
• Handles both regular methods and properties correctly
• Skips private attributes (starting with underscore)

dagshub/data_engine/util/not_implemented.py

View more (3)

4. tests/data_engine/annotation_import/test_annotation_parsing.py 🧪 Tests +54/-7

Add tests for double-loading and unsupported annotations

• Refactored mock_get_blob to support both in-memory and path-based returns
• Extracted common fixture logic into _ds_with_annotation helper function
• Added ds_with_unsupported_annotation fixture for testing unsupported annotations
• Added test_double_loading_annotation_works to verify double-loading doesn't break
• Added test_handling_unsupported_annotation to verify unsupported annotations work correctly
• Extracted annotation assertion logic into _test_annotation helper

tests/data_engine/annotation_import/test_annotation_parsing.py

5. tests/data_engine/annotation_import/res/audio_annotation.json 🧪 Tests +82/-0

Add audio annotation test fixture file

• New test fixture file containing audio metadata annotation in Label Studio format
• Includes audio data reference and multiple annotation types (choices, rating, textarea)
• Used to test handling of unsupported annotation types

tests/data_engine/annotation_import/res/audio_annotation.json

6. dagshub/data_engine/util/__init__.py Additional files +0/-0

...

dagshub/data_engine/util/init.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

tests/data_engine/annotation_import/res/audio_annotation.json (1)

44-71: Consider using fake PII in test fixtures.

The test data contains what appears to be a real email address and name. Even in test fixtures, it's better practice to use obviously fake data (e.g., test@example.com, John, Doe) to avoid any compliance concerns around PII in source control.

dagshub/data_engine/util/not_implemented.py (1)

9-12: Edge case: dir(base) includes inherited attributes from object and all ancestors.

dir(base) returns attributes from the entire MRO, not just base itself. If MetadataAnnotations inherits public methods from other classes in the future, those will also be replaced with NotImplementedError in the subclass. This is likely the desired behavior, but worth being aware of.

Also, staticmethod and classmethod descriptors from the base are not detected separately — they'll be seen as callable and wrapped as instance methods, which changes their calling convention. Currently not a problem since from_ls_task (a @staticmethod) is always called on the base class directly, but could be surprising if the pattern is reused.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c964d0 and e4ace64.

📒 Files selected for processing (6)

dagshub/data_engine/annotation/metadata.py
dagshub/data_engine/model/query_result.py
dagshub/data_engine/util/__init__.py
dagshub/data_engine/util/not_implemented.py
tests/data_engine/annotation_import/res/audio_annotation.json
tests/data_engine/annotation_import/test_annotation_parsing.py

🧰 Additional context used

🧬 Code graph analysis (2)

dagshub/data_engine/annotation/metadata.py (3)

dagshub/common/helpers.py (1)

log_message (100-107)

dagshub/data_engine/util/not_implemented.py (1)

NotImplementedMeta (1-48)

dagshub/data_engine/model/datapoint.py (1)

Datapoint (29-263)

tests/data_engine/annotation_import/test_annotation_parsing.py (2)

dagshub/data_engine/annotation/metadata.py (7)

MetadataAnnotations (36-319)

UnsupportedMetadataAnnotations (322-339)

add_image_bbox (146-183)

value (104-114)

value (332-333)

to_ls_task (88-101)

to_ls_task (335-336)

dagshub/data_engine/model/query_result.py (1)

get_blob_fields (334-437)

🪛 Ruff (0.15.0)

tests/data_engine/annotation_import/test_annotation_parsing.py

[warning] 64-64: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Agent
GitHub Check: build (3.11)
GitHub Check: build (3.10)
GitHub Check: build (3.9)
GitHub Check: build (3.13)
GitHub Check: build (3.12)

🔇 Additional comments (5)

dagshub/data_engine/annotation/metadata.py (1)

322-339: LGTM!

The UnsupportedMetadataAnnotations class correctly wraps unsupported annotations while preserving value and to_ls_task behavior. The metaclass will replace all non-overridden public methods (like add_image_bbox, add_image_segmentation, etc.) with NotImplementedError, which is the intended behavior.

dagshub/data_engine/model/query_result.py (2)

429-435: Good defensive guard against re-processing parsed annotations as documents.

This correctly prevents the TypeError: expected str, bytes or os.PathLike object, not MetadataAnnotations by skipping fields that were already parsed as annotations. Solid fix for Bug 2.

449-463: LGTM — annotation conversion guards and unsupported annotation fallback.

The isinstance guard at line 451 prevents double-loading, and the ValidationError catch at line 460 gracefully wraps invalid annotations in UnsupportedMetadataAnnotations instead of crashing.

tests/data_engine/annotation_import/test_annotation_parsing.py (2)

99-115: LGTM — good coverage of basic annotation parsing and double-load safety.

test_double_loading_annotation_works at line 104 is a valuable regression test ensuring that calling get_blob_fields a second time doesn't crash or corrupt the already-parsed annotations.

118-134: Good test coverage for unsupported annotations.

Verifies the UnsupportedMetadataAnnotations subclass behavior: isinstance checks, NotImplementedError on unsupported methods, and content preservation via value/to_ls_task.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

tests/data_engine/annotation_import/test_annotation_parsing.py

Copilot

Pull request overview

Fixes Data Engine metadata blob handling so unsupported/invalid Label Studio annotations no longer break Datasource.all() / QueryResult.get_annotations(), and adds regression tests for double-loading behavior.

Changes:

Add UnsupportedMetadataAnnotations to represent annotations that fail parsing while keeping consistent MetadataAnnotations typing.
Harden QueryResult.get_blob_fields() conversions to avoid re-processing already-parsed annotation fields and improve handling of path-vs-bytes blobs.
Extend annotation parsing tests and add an “unsupported” annotation fixture.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`dagshub/data_engine/model/query_result.py`	Adjusts blob/document/annotation conversion logic and introduces unsupported-annotation wrapper usage.
`dagshub/data_engine/annotation/metadata.py`	Adds `UnsupportedMetadataAnnotations` type for unsupported/invalid LS tasks.
`dagshub/data_engine/util/not_implemented.py`	Introduces a metaclass to block inherited methods/properties on unsupported annotations.
`tests/data_engine/annotation_import/test_annotation_parsing.py`	Adds regression tests for double-loading and unsupported annotations.
`tests/data_engine/annotation_import/res/audio_annotation.json`	Adds a fixture representing an unsupported (audio) LS task.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/data_engine/annotation_import/test_annotation_parsing.py

Copilot · 2026-02-15T14:37:02Z

dagshub/data_engine/model/query_result.py

+                        # Defensive check to not mangle annotation fields by accident
+                        if isinstance(dp.metadata[fld], MetadataAnnotations):
+                            continue
+                        # Force load the content into memory, even if load_into_memory was set to False
+                        if not load_into_memory or isinstance(dp.metadata[fld], Path):
                            dp.metadata[fld] = Path(dp.metadata[fld]).read_bytes()
                        dp.metadata[fld] = dp.metadata[fld].decode("utf-8")


When _get_blob fails it returns a string like "Error while downloading binary blob: ..." (see Datapoint._get_blob). In that case this conversion path will call Path(<error string>).read_bytes() and raise FileNotFoundError, turning a download warning into a hard failure. Add a guard before Path(...).read_bytes() to detect error strings / non-path values (or make _get_blob raise and handle it), and skip decoding/conversion for that field when download failed.

That also sounds correct

Copilot · 2026-02-15T14:37:02Z

dagshub/data_engine/model/query_result.py

                        # Already loaded - skip
                        if isinstance(dp.metadata[fld], MetadataAnnotations):
                            continue
-                        # Override the load_into_memory flag, because we need the contents
-                        if not load_into_memory:
+                        # Force load the content into memory, even if load_into_memory was set to False
+                        if not load_into_memory or isinstance(dp.metadata[fld], Path):
                            dp.metadata[fld] = Path(dp.metadata[fld]).read_bytes()
                        try:
                            dp.metadata[fld] = MetadataAnnotations.from_ls_task(
                                datapoint=dp, field=fld, ls_task=dp.metadata[fld]
                            )
                        except ValidationError:
+                            dp.metadata[fld] = UnsupportedMetadataAnnotations(
+                                datapoint=dp, field=fld, original_value=dp.metadata[fld]
+                            )


Similarly for annotation conversion: if _get_blob returned an error string (or any unexpected non-path, non-bytes value), Path(dp.metadata[fld]).read_bytes() will raise and the exception is not caught (only ValidationError is handled). Add a type/value guard before reading from disk, and consider treating download failures separately from ValidationError so unsupported/invalid annotations don’t crash get_blob_fields().

tests/data_engine/annotation_import/res/audio_annotation.json

dagshub/data_engine/annotation/metadata.py

Copilot · 2026-02-15T14:37:03Z

dagshub/data_engine/util/not_implemented.py

@@ -0,0 +1,48 @@
+class NotImplementedMeta(type):
+    """
+    A metaclass that replaces all parent class method and properties that aren't overridden in the subclass


Docstring grammar: “method and properties” should be “methods and properties”.

Suggested change

A metaclass that replaces all parent class method and properties that aren't overridden in the subclass

A metaclass that replaces all parent class methods and properties that aren't overridden in the subclass

Copilot · 2026-02-15T14:37:03Z

dagshub/data_engine/util/not_implemented.py

+        # Get all attributes from base classes
+        for base in bases:
+            for attr_name in dir(base):
+                if attr_name.startswith("_"):
+                    continue


NotImplementedMeta iterates over dir(base) which includes attributes coming from type (e.g. mro) and other inherited members that are not actually defined on the parent class. This means the metaclass can unintentionally overwrite built-in class/introspection behavior on subclasses. Consider iterating over base.__dict__ instead (and handling staticmethod/classmethod descriptors explicitly) to only wrap methods/properties actually declared on the parent class.

qodo-code-review · 2026-02-15T14:38:03Z

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (2) 📎 Requirement gaps (0)

✅ 1. __repr__ leaks raw bytes 📘 Rule violation ⛨ Security

Description

UnsupportedMetadataAnnotations.__repr__() embeds _original_value directly, which can expose
potentially sensitive annotation content via logs, tracebacks, or debug output. This conflicts with
secure logging/error-handling expectations and can leak user-provided data.

Code

dagshub/data_engine/annotation/metadata.py[R338-339]

+    def __repr__(self):
+        return f"Unsupported LS annotations with original value: {self._original_value}"

Evidence
Secure logging/error handling requires avoiding sensitive data exposure; the new __repr__ returns
a string containing the full original annotation bytes, which may include PII or other sensitive
content and can surface in logs or user-visible errors.
Rule 5: Generic: Secure Logging Practices
Rule 4: Generic: Secure Error Handling
dagshub/data_engine/annotation/metadata.py[338-339]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`UnsupportedMetadataAnnotations.__repr__()` currently includes the full `_original_value` bytes in the returned string, which may leak sensitive annotation content through logs, tracebacks, or debug printing.
## Issue Context
This object may be implicitly stringified (logging, exceptions, debugging). To align with secure logging/error handling, the representation should avoid printing raw content.
## Fix Focus Areas
- dagshub/data_engine/annotation/metadata.py[338-339]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Warning logs full datapoints 📘 Rule violation ⛨ Security

Description

The warning for unsupported/invalid annotations logs full datapoint paths, which may contain
sensitive identifiers. This reduces log hygiene and can leak sensitive information into application
logs.

Code

dagshub/data_engine/model/query_result.py[R471-478]

+                "Warning: The following datapoints had unsupported or invalid annotations, "
               "any annotation-related operations will not work on these:"
           )
           err_msg = ""
           for fld, dps in bad_annotations.items():
-                err_msg += f'Field "{fld}" in datapoints:\n\t'
+                err_msg += f'\nField "{fld}" in datapoints:\n\t'
               err_msg += "\n\t".join(dps)
           log_message(err_msg)

Evidence
Secure logging guidance requires avoiding sensitive data in logs; the updated warning builds and
logs a message containing field names and the complete list of affected datapoint paths.
Rule 5: Generic: Secure Logging Practices
dagshub/data_engine/model/query_result.py[471-478]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The warning emitted for unsupported/invalid annotations logs full datapoint paths, which may embed sensitive identifiers and should not be written to general logs.
## Issue Context
The code constructs `err_msg` by joining all datapoint paths and then calls `log_message(err_msg)`.
## Fix Focus Areas
- dagshub/data_engine/model/query_result.py[471-478]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. Metaclass too broad 🐞 Bug ⛯ Reliability

Description

NotImplementedMeta iterates dir(base) and replaces every callable/property not overridden, which
can unintentionally override inherited type attributes (e.g., mro) and descriptor semantics,
leading to surprising runtime/introspection behavior for UnsupportedMetadataAnnotations. This is
risky and hard to debug if any code (or tooling) relies on class attributes inherited from type or
uses descriptors/classmethods.

Code

dagshub/data_engine/util/not_implemented.py[R7-46]

+    def __new__(mcs, name, bases, namespace):
+        # Get all attributes from base classes
+        for base in bases:
+            for attr_name in dir(base):
+                if attr_name.startswith("_"):
+                    continue
+
+                # Skip if already defined in subclass
+                if attr_name in namespace:
+                    continue
+
+                base_attr = getattr(base, attr_name)
+
+                # Handle properties
+                if isinstance(base_attr, property):
+                    # Create a property that raises NotImplementedError
+                    def make_not_implemented_property(prop_name):
+                        def getter(self):
+                            raise NotImplementedError(f"Property '{prop_name}' not implemented")
+
+                        def setter(self, value):
+                            raise NotImplementedError(f"Property '{prop_name}' not implemented")
+
+                        def deleter(self):
+                            raise NotImplementedError(f"Property '{prop_name}' not implemented")
+
+                        return property(getter, setter, deleter)
+
+                    namespace[attr_name] = make_not_implemented_property(attr_name)
+
+                # Handle regular methods
+                elif callable(base_attr):
+
+                    def make_not_implemented(method_name):
+                        def not_impl(self, *args, **kwargs):
+                            raise NotImplementedError(f"Method '{method_name}' not implemented")
+
+                        return not_impl
+
+                    namespace[attr_name] = make_not_implemented(attr_name)

Evidence

The metaclass walks dir(base) (includes inherited attributes, not just attributes defined on the
base class) and replaces any callable/property with a stub raising NotImplementedError, without
filtering to base.__dict__ or preserving descriptor types (staticmethod/classmethod).

dagshub/data_engine/util/not_implemented.py[7-46]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`NotImplementedMeta` currently iterates `dir(base)` and replaces any callable/property not overridden. This can also catch attributes inherited from `type` and can change descriptor semantics.
### Issue Context
The intent is to disable the *instance API* of `MetadataAnnotations` for `UnsupportedMetadataAnnotations`, not to shadow inherited class/type behaviors.
### Fix Focus Areas
- dagshub/data_engine/util/not_implemented.py[7-46]
### Suggested approach
- Iterate `for attr_name, raw in base.__dict__.items()` instead of `dir(base)`.
- Skip dunder/private names.
- Preserve descriptor wrappers:
- if `isinstance(raw, staticmethod)`: wrap the underlying function in a `staticmethod` that raises
- if `isinstance(raw, classmethod)`: wrap in a `classmethod` that raises
- if `isinstance(raw, property)`: create a property raising
- if `inspect.isfunction(raw)`: create a normal method raising
- Consider an allowlist of methods/properties that should remain usable (e.g., `value`, `to_ls_task`, `__repr__`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. Document Path(bytes) risk 🐞 Bug ✓ Correctness

Description

Document conversion still unconditionally runs Path(dp.metadata[fld]).read_bytes() whenever
load_into_memory=False, even if the field value is already bytes from a prior in-memory load.
Calling get_blob_fields on document fields multiple times with different load_into_memory values
can still raise TypeError: argument should be a str or os.PathLike, not 'bytes'.

Code

dagshub/data_engine/model/query_result.py[R429-435]

+                        # Defensive check to not mangle annotation fields by accident
+                        if isinstance(dp.metadata[fld], MetadataAnnotations):
+                            continue
+                        # Force load the content into memory, even if load_into_memory was set to False
+                        if not load_into_memory or isinstance(dp.metadata[fld], Path):
                           dp.metadata[fld] = Path(dp.metadata[fld]).read_bytes()
                       dp.metadata[fld] = dp.metadata[fld].decode("utf-8")

Evidence

The document conversion block converts requested document fields regardless of prior state. Its
condition is based on load_into_memory and isinstance(..., Path) only, but does not handle the
case where the value is already bytes, and then passes it to Path(...).

dagshub/data_engine/model/query_result.py[425-435]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Document conversion can crash if a document field was previously loaded into `bytes` and then `get_blob_fields(..., load_into_memory=False)` is called again: the code will do `Path(bytes)`.
### Issue Context
This is a common pattern when users call `get_blob_fields` multiple times with different flags.
### Fix Focus Areas
- dagshub/data_engine/model/query_result.py[425-436]
### Suggested approach
- Normalize handling:
- `val = dp.metadata[fld]`
- if `isinstance(val, (Path, os.PathLike))`: `val = Path(val).read_bytes()`
- elif `isinstance(val, str)`:
  - if it is a filesystem path and exists, read bytes
  - else assume it is already decoded content and skip reading
- if `isinstance(val, bytes)`: `dp.metadata[fld] = val.decode(&amp;#x27;utf-8&amp;#x27;)`
- elif `isinstance(val, str)`: leave as-is
- Avoid `Path(val)` unless `val` is path-like.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

dagshub/data_engine/annotation/metadata.py

guysmoilov · 2026-02-15T21:10:49Z

dagshub/data_engine/annotation/metadata.py

+        return self._original_value
+
+    def __repr__(self):
+        return "Unsupported LS annotations"


Suggested change

return "Unsupported LS annotations"

return "Label Studio annotations of unrecognized type"

guysmoilov · 2026-02-15T21:15:59Z

dagshub/data_engine/model/query_result.py

+                        # Force load the content into memory, even if load_into_memory was set to False
+                        if not load_into_memory or isinstance(dp.metadata[fld], Path):


I don't get the condition and the comment.

The comment says to ignore load_into_memory - then why do you check it? I assume it's because when load_into_memory == True, then in some earlier part of the code the field was already read as bytes. Then why not just assert that? I guess maybe because then the field content would be string or bytes and you can't know for sure whether that's the blob content or a path without relying on the assumed earlier side effects of the field (would be solved by the type safety we discussed, but out of scope in this PR).

Assuming everything I've guessed is correct - I think a comment explaining your assumptions would really help to make this code understandable

guysmoilov · 2026-02-15T21:16:28Z

dagshub/data_engine/model/query_result.py

+                        # Defensive check to not mangle annotation fields by accident
+                        if isinstance(dp.metadata[fld], MetadataAnnotations):
+                            continue
+                        # Force load the content into memory, even if load_into_memory was set to False
+                        if not load_into_memory or isinstance(dp.metadata[fld], Path):
                            dp.metadata[fld] = Path(dp.metadata[fld]).read_bytes()
                        dp.metadata[fld] = dp.metadata[fld].decode("utf-8")


That also sounds correct

guysmoilov · 2026-02-15T21:18:00Z

dagshub/data_engine/model/query_result.py

+                        # Force load the content into memory, even if load_into_memory was set to False
+                        if not load_into_memory or isinstance(dp.metadata[fld], Path):


This condition should probably be refactored to a function and made DRY

guysmoilov · 2026-02-15T21:18:28Z

dagshub/data_engine/model/query_result.py

+                            )
                            bad_annotations[fld].append(dp.path)
                    else:
+                        # Empty annotation container


I recommend flipping the if-else order to make it readable. Or even just use continue and get rid of a nesting level.

guysmoilov · 2026-02-15T21:19:48Z

dagshub/data_engine/model/query_result.py

+                "Warning: The following datapoints had unsupported or invalid annotations, "
                "any annotation-related operations will not work on these:"


IMO this warning message is too scary - the users can do whatever they want with the result just fine.
If what you mean is that they won't be able to use our fancy add_bounding_box etc methods - then just say that. The users don't know what you mean.

kbolashev added 2 commits February 15, 2026 16:02

Better handling of unsupported LS annotations

511801c

Test double loading annotations

e4ace64

kbolashev requested review from Copilot and guysmoilov February 15, 2026 14:31

kbolashev self-assigned this Feb 15, 2026

kbolashev added the bug Something isn't working label Feb 15, 2026

Copilot started reviewing on behalf of kbolashev February 15, 2026 14:32 View session

coderabbitai bot reviewed Feb 15, 2026

View reviewed changes

tests/data_engine/annotation_import/test_annotation_parsing.py Show resolved Hide resolved

Copilot AI reviewed Feb 15, 2026

View reviewed changes

qodo-code-review bot reviewed Feb 15, 2026

View reviewed changes

dagshub/data_engine/annotation/metadata.py Outdated Show resolved Hide resolved

kbolashev added 2 commits February 15, 2026 16:38

Fix bad load_into_memory check in mock_get_blob

fd10d7e

More PR fixes

4db51a9

guysmoilov requested changes Feb 15, 2026

View reviewed changes

	A metaclass that replaces all parent class method and properties that aren't overridden in the subclass
	A metaclass that replaces all parent class methods and properties that aren't overridden in the subclass

	return "Unsupported LS annotations"
	return "Label Studio annotations of unrecognized type"

		# Force load the content into memory, even if load_into_memory was set to False
		if not load_into_memory or isinstance(dp.metadata[fld], Path):

		"Warning: The following datapoints had unsupported or invalid annotations, "
		"any annotation-related operations will not work on these:"

Conversation

kbolashev commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug 1: unsupported metadata annotations break ds.all(), if they were also tagged as document.

Solution:

Bug 2: Possible to load an annotation as a document a second time, despite the annotation already having been parsed

Solution:

Implementation Details

Core Components

Query Processing Improvements (dagshub/data_engine/model/query_result.py)

Test Coverage

Uh oh!

dagshub bot commented Feb 15, 2026

Uh oh!

coderabbitai bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

qodo-code-review bot commented Feb 15, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

guysmoilov Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

Uh oh!

guysmoilov Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

guysmoilov Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

guysmoilov Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

guysmoilov Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

guysmoilov Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

guysmoilov Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

kbolashev commented Feb 15, 2026 •

edited

Loading

Bug 1: unsupported metadata annotations break `ds.all()`, if they were also tagged as document.

Query Processing Improvements (`dagshub/data_engine/model/query_result.py`)

coderabbitai bot commented Feb 15, 2026 •

edited

Loading

qodo-code-review bot commented Feb 15, 2026 •

edited

Loading