Skip to content

Comments

Add support for video (MOT + CVAT Video) and COCO image (BBOX and Segmentation) import and export#652

Open
deanp70 wants to merge 24 commits intomainfrom
video_coco_converters
Open

Add support for video (MOT + CVAT Video) and COCO image (BBOX and Segmentation) import and export#652
deanp70 wants to merge 24 commits intomainfrom
video_coco_converters

Conversation

@deanp70
Copy link
Member

@deanp70 deanp70 commented Feb 10, 2026

Based on changes to dagshub-annotation-converter to the same effect: DagsHub/dagshub-annotation-converter#25

Implementation Details

The PR extends annotation support through a multi-layered approach:

Importer Module Enhancements (importer.py):

  • Extended AnnotationType literal to include "coco", "mot", and "cvat_video"
  • Introduced is_video_format property to identify video-based annotation types
  • Added video-specific annotation handling via _flatten_video_annotations to convert per-frame annotations into single video-entry dictionaries
  • Implemented video conversion routing in convert_to_ls_tasks for Label Studio video task generation
  • Adjusted remap_annotations to handle cases where video format annotations lack explicit filenames
  • New internal helpers: _is_video_annotation_dict, _flatten_cvat_fs_annotations, _flatten_mot_fs_annotations

Metadata Annotations Support (metadata.py):

  • Added VideoRectangleAnnotation support in Label Studio task lookup registry
  • Introduced add_coco_annotation method to load COCO-format JSON and integrate annotations into metadata

Query Result Export API (query_result.py):

  • Three new export methods:
    • export_as_coco() - generates COCO JSON with optional class mapping and custom output filename
    • export_as_mot() - supports single/multi-video export with optional dimension parameters
    • export_as_cvat_video() - handles video CVAT XML generation with video name customization
  • Added video annotation helpers: _get_all_video_annotations, _prepare_video_file_for_export, _get_annotation_filename
  • Introduced _resolve_annotation_field to auto-select available annotation field with alphabetical fallback
  • Refactored YOLO export to use the new annotation field resolution

Test Coverage:

  • test_coco.py (218 lines): Import/export flows, filename rewriting, field resolution, class mapping, output customization
  • test_mot.py (383 lines): Video annotation flattening, directory/zip import, MOT export structure, dimension handling
  • test_cvat_video.py (276 lines): CVAT video XML import/export, track/box element generation, multi-datapoint support

Code Quality and Testing Infrastructure:

  • Enhanced ruff.toml linting configuration with comprehensive rule set including:
    • Security checks (flake8-bandit)
    • Bug detection (flake8-bugbear)
    • Try/except pattern validation (tryceratops)
    • Unused argument detection and other design pattern checks
    • Per-file exceptions for __init__.py and conftest.py
  • Added DatasourceType.REPOSITORY assignment in test fixtures (tests/data_engine/conftest.py)
  • Added id property to MockRepoAPI test utility
  • Minor refactor in token_auth.py (no behavioral change - logical equivalence in boolean expression)

Documentation Note: Per PR comments, the supported formats list at dagshub.com/docs/use_cases/annotation/#supported-formats-for-annotation_imports requires manual update to reflect MOT, CVAT Video, COCO BBOX, and COCO Segmentation support.

@deanp70 deanp70 requested a review from kbolashev February 10, 2026 14:58
@dagshub
Copy link

dagshub bot commented Feb 10, 2026

@qodo-code-review
Copy link

Review Summary by Qodo

Add video and COCO annotation import/export support

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add support for importing and exporting video annotations (MOT, CVAT Video)
• Add support for importing and exporting COCO image annotations (bbox, segmentation)
• Implement video annotation flattening and Label Studio video task conversion
• Fix error handling in blob download with try-catch moved to query_result.py
• Add annotation field resolution utility for automatic field detection
Diagram
flowchart LR
  A["Video Formats<br/>MOT, CVAT Video"] --> B["AnnotationImporter"]
  C["Image Formats<br/>COCO"] --> B
  B --> D["Flatten Video<br/>Annotations"]
  B --> E["Convert to<br/>Label Studio"]
  D --> F["QueryResult<br/>Export Methods"]
  E --> F
  F --> G["export_as_mot<br/>export_as_coco<br/>export_as_cvat_video"]
Loading

Grey Divider

File Changes

1. dagshub/data_engine/annotation/importer.py ✨ Enhancement +97/-9

Add video and COCO annotation import/export support

• Extended AnnotationType to include "coco", "mot", and "cvat_video" formats
• Added is_video_format property to identify video annotation types
• Implemented video annotation import logic for MOT and CVAT Video formats with optional dimension
 parameters
• Added _is_video_annotation_dict() method to distinguish video (int-keyed) from image (str-keyed)
 annotations
• Added _flatten_video_annotations() method to merge frame-indexed video annotations into single
 entries
• Implemented _convert_to_ls_video_tasks() for converting video annotations to Label Studio format
• Updated remap_annotations() to handle video annotations without filenames
• Extended download_annotations() to support COCO and MOT formats

dagshub/data_engine/annotation/importer.py


2. dagshub/data_engine/annotation/metadata.py ✨ Enhancement +27/-0

Add COCO annotation support to metadata

• Added import for VideoRectangleAnnotation from Label Studio video format
• Registered videorectangle annotation type in task lookup dictionary
• Implemented add_coco_annotation() method to add COCO-format JSON annotations to datapoints
• Method automatically rewrites annotation filenames to match datapoint paths

dagshub/data_engine/annotation/metadata.py


3. dagshub/data_engine/model/datapoint.py 🐞 Bug fix +8/-11

Refactor blob download error handling

• Moved try-catch error handling from _get_blob() function to caller in query_result.py
• Simplified blob download logic by removing exception wrapping at function level
• Allows proper exception propagation for better error handling upstream

dagshub/data_engine/model/datapoint.py


View more (4)
4. dagshub/data_engine/model/query_result.py ✨ Enhancement +170/-9

Add video and COCO export methods with utilities

• Added imports for COCO, MOT, and CVAT video export functions
• Added imports for video annotation types and contexts
• Implemented _resolve_annotation_field() utility method for automatic annotation field detection
• Implemented _get_all_video_annotations() method to filter and extract video annotations
• Implemented export_as_coco() method with support for custom classes and output filename
• Implemented export_as_mot() method with optional dimension parameters and MOT directory
 structure
• Implemented export_as_cvat_video() method with video metadata and optional dimensions
• Added try-catch error handling in _get_blob_fn() to properly handle blob download failures

dagshub/data_engine/model/query_result.py


5. tests/data_engine/annotation_import/test_coco.py 🧪 Tests +218/-0

Add COCO annotation import/export tests

• Added comprehensive test suite for COCO annotation import functionality
• Tests for COCO file import, Label Studio task conversion, and metadata annotation addition
• Tests for annotation field resolution (explicit, automatic, alphabetical ordering)
• Tests for COCO export with bbox coordinates, custom classes, multiple datapoints
• Helper functions for creating and writing COCO JSON structures

tests/data_engine/annotation_import/test_coco.py


6. tests/data_engine/annotation_import/test_cvat_video.py 🧪 Tests +183/-0

Add CVAT video annotation tests

• Added comprehensive test suite for CVAT video annotation import and export
• Tests for video annotation import from XML files
• Tests for video annotation filtering and aggregation across datapoints
• Tests for CVAT video export with custom video names and multiple datapoints
• Tests for error handling when no video annotations are present
• Helper functions for creating video bounding box annotations and CVAT XML

tests/data_engine/annotation_import/test_cvat_video.py


7. tests/data_engine/annotation_import/test_mot.py 🧪 Tests +236/-0

Add MOT annotation import/export tests

• Added comprehensive test suite for MOT annotation import and export
• Tests for video annotation dictionary detection (int vs string keys)
• Tests for is_video_format property across all annotation types
• Tests for video annotation flattening with frame merging and custom video names
• Tests for MOT import from directories and zip files
• Tests for MOT export with directory structure validation and custom dimensions
• Helper functions for creating MOT directory structures and zip archives

tests/data_engine/annotation_import/test_mot.py


Grey Divider

Qodo Logo

@coderabbitai
Copy link

coderabbitai bot commented Feb 10, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
📝 Walkthrough

Walkthrough

This pull request extends annotation format support in the data engine by adding COCO, MOT, and CVAT video formats alongside existing YOLO and CVAT support. Changes include new import/export loaders, video annotation flattening logic, Label Studio conversion for video annotations, and corresponding test coverage.

Changes

Cohort / File(s) Summary
Configuration
.gitignore
Added ignore rule for my_test.py.
Core Annotation Importer
dagshub/data_engine/annotation/importer.py
Extended AnnotationType literal to include coco, mot, cvat_video. Added loaders for COCO (load_coco_from_file), MOT (load_mot_from_dir, load_mot_from_zip), and CVAT video formats. Introduced is_video_format property and video annotation flattening logic (_flatten_video_annotations). Extended Label Studio conversion to route video annotations via _convert_to_ls_video_tasks.
Annotation Metadata
dagshub/data_engine/annotation/metadata.py
Added public method add_coco_annotation for importing COCO annotations from JSON strings. Registered VideoRectangleAnnotation in the Label Studio task lookup.
Data Model
dagshub/data_engine/model/datapoint.py
Changed error handling in _get_blob from catching exceptions and returning error strings to propagating exceptions to callers.
Export Functionality
dagshub/data_engine/model/query_result.py
Added three new public export methods: export_as_coco, export_as_mot, export_as_cvat_video. Introduced helper methods for annotation field resolution (_resolve_annotation_field) and video annotation filtering (_get_all_video_annotations). Enhanced blob download error handling with logging and field removal on failure.
Test Coverage
tests/data_engine/annotation_import/test_coco.py, test_cvat_video.py, test_mot.py
Added comprehensive test modules covering COCO import/export, CVAT video import/export, and MOT import/export workflows. Tests validate annotation structure, format conversion, error handling, and multi-datapoint export scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Importer as AnnotationImporter
    participant Loader as Format Loader
    participant Converter as IR Converter
    participant Output as Export/LS

    User->>Importer: import_annotations(format=coco/mot/cvat_video)
    Importer->>Loader: load_coco_from_file/load_mot_from_dir/load_cvat_from_zip
    Loader->>Converter: Parse format to raw data
    Converter->>Converter: Convert to IRAnnotation objects
    
    alt is_video_format
        Converter->>Importer: Return frame-wise annotations
        Importer->>Importer: _flatten_video_annotations()
        Importer->>Output: Return video-keyed dict
    else image-based format
        Converter->>Output: Return filename-keyed dict
    end
    
    Output-->>User: Annotations ready for metadata/export
Loading
sequenceDiagram
    participant User
    participant QueryResult as QueryResult
    participant Importer as AnnotationImporter
    participant Exporter as Format Exporter
    participant FileSystem as Output Files

    User->>QueryResult: export_as_coco/mot/cvat_video(download_dir)
    QueryResult->>QueryResult: _resolve_annotation_field()
    QueryResult->>Importer: _get_all_video_annotations() or gather annotations
    QueryResult->>Exporter: Create context (CocoContext/MOTContext)
    
    alt Video Format (MOT/CVAT Video)
        Exporter->>FileSystem: Write video tracks/XML
    else COCO Format
        Exporter->>FileSystem: Write COCO JSON with image/annotation records
    end
    
    FileSystem-->>User: Return path to exported file/directory
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 1
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch video_coco_converters

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link

CI Feedback 🧐

A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

Action: build (3.13)

Failed stage: Test with pytest [❌]

Failed test name: ""

Failure summary:

The action failed during pytest test collection due to an import error:
-
tests/data_engine/conftest.py:7 imports dagshub.data_engine.datasources, which eventually imports
export_to_coco_file from dagshub_annotation_converter.converters.coco in
dagshub/data_engine/model/query_result.py:18.
- The module
dagshub_annotation_converter.converters.coco is not present in the installed environment, causing
ModuleNotFoundError and stopping the test run (pytest exited with code 2).

Relevant error logs:
1:  ##[group]Runner Image Provisioner
2:  Hosted Compute Agent
...

1219:  Downloading semver-3.0.4-py3-none-any.whl (17 kB)
1220:  Building wheels for collected packages: dagshub
1221:  Building wheel for dagshub (pyproject.toml): started
1222:  Building wheel for dagshub (pyproject.toml): finished with status 'done'
1223:  Created wheel for dagshub: filename=dagshub-0.6.5-py3-none-any.whl size=263873 sha256=f1a10d3965d075e8025a833fcd414af84e521ca3e8ec3641ce0b449b39f224f7
1224:  Stored in directory: /tmp/pip-ephem-wheel-cache-3rvtws41/wheels/73/4d/4b/9f7e7fa5a39daad7f979d588e7462710e517d2de0e1e8f1ff5
1225:  Successfully built dagshub
1226:  Installing collected packages: appdirs, typing-inspection, treelib, tenacity, semver, pygments, pydantic-core, pathvalidate, mypy-extensions, mdurl, marshmallow, lxml, dacite, backoff, annotated-types, typing-inspect, requests_toolbelt, pydantic, markdown-it-py, gql, rich, dataclasses-json, dagshub-annotation-converter, dagshub
1227:  Attempting uninstall: dacite
1228:  Found existing installation: dacite 1.7.0
1229:  Uninstalling dacite-1.7.0:
1230:  Successfully uninstalled dacite-1.7.0
1231:  Successfully installed annotated-types-0.7.0 appdirs-1.4.4 backoff-2.2.1 dacite-1.6.0 dagshub-0.6.5 dagshub-annotation-converter-0.1.16 dataclasses-json-0.6.7 gql-4.0.0 lxml-6.0.2 markdown-it-py-4.0.0 marshmallow-3.26.2 mdurl-0.1.2 mypy-extensions-1.1.0 pathvalidate-3.3.1 pydantic-2.12.5 pydantic-core-2.41.5 pygments-2.19.2 requests_toolbelt-1.0.0 rich-14.3.2 semver-3.0.4 tenacity-9.1.4 treelib-1.8.0 typing-inspect-0.9.0 typing-inspection-0.4.2
1232:  [notice] A new release of pip is available: 26.0 -> 26.0.1
1233:  [notice] To update, run: pip install --upgrade pip
1234:  ##[group]Run # stop the build if there are Python syntax errors or undefined names
1235:  �[36;1m# stop the build if there are Python syntax errors or undefined names�[0m
1236:  �[36;1mflake8 . --count --select=E9,F63,F7,F82 --show-source --statistics�[0m
1237:  �[36;1m# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide�[0m
1238:  �[36;1mflake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics�[0m
...

1327:  30    E203 whitespace before ':'
1328:  3     E701 multiple statements on one line (colon)
1329:  81
1330:  ##[group]Run pytest
1331:  �[36;1mpytest�[0m
1332:  shell: /usr/bin/bash -e {0}
1333:  env:
1334:  pythonLocation: /opt/hostedtoolcache/Python/3.13.11/x64
1335:  LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.13.11/x64/lib
1336:  ##[endgroup]
1337:  ============================= test session starts ==============================
1338:  platform linux -- Python 3.13.11, pytest-8.3.5, pluggy-1.6.0
1339:  rootdir: /home/runner/work/client/client
1340:  configfile: pytest.ini
1341:  plugins: shutil-1.8.1, respx-0.22.0, timeout-2.4.0, env-1.1.5, anyio-4.12.1, git-1.8.0, mock-3.14.0
1342:  collected 132 items / 1 error
1343:  ==================================== ERRORS ====================================
1344:  ______________________ ERROR collecting tests/data_engine ______________________
1345:  /opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/importlib/__init__.py:88: in import_module
...

1350:  ???
1351:  <frozen importlib._bootstrap>:1331: in _find_and_load_unlocked
1352:  ???
1353:  <frozen importlib._bootstrap>:935: in _load_unlocked
1354:  ???
1355:  /opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/_pytest/assertion/rewrite.py:185: in exec_module
1356:  exec(co, module.__dict__)
1357:  tests/data_engine/conftest.py:7: in <module>
1358:  from dagshub.data_engine import datasources
1359:  dagshub/data_engine/datasources.py:8: in <module>
1360:  from dagshub.data_engine.client.data_client import DataClient
1361:  dagshub/data_engine/client/data_client.py:29: in <module>
1362:  from dagshub.data_engine.model.query_result import QueryResult
1363:  dagshub/data_engine/model/query_result.py:18: in <module>
1364:  from dagshub_annotation_converter.converters.coco import export_to_coco_file
1365:  E   ModuleNotFoundError: No module named 'dagshub_annotation_converter.converters.coco'
1366:  =========================== short test summary info ============================
1367:  ERROR tests/data_engine - ModuleNotFoundError: No module named 'dagshub_annotation_converter.converters.coco'
1368:  !!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
1369:  =============================== 1 error in 0.80s ===============================
1370:  ##[error]Process completed with exit code 2.
1371:  Post job cleanup.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
dagshub/data_engine/model/query_result.py (1)

410-416: Consider catching more specific exceptions for blob download failures.

The broad Exception catch is flagged by static analysis (BLE001). While this prevents crashes from download errors, consider catching more specific exceptions (e.g., requests.RequestException, IOError) to avoid masking unexpected errors.

The current approach of logging and removing the field from metadata is reasonable for resilience.

♻️ Optional: More specific exception handling
         def _get_blob_fn(dp: Datapoint, field: str, url: str, blob_path: Path):
             try:
                 blob_or_path = _get_blob(url, blob_path, auth, cache_on_disk, load_into_memory, path_format)
-            except Exception as e:
+            except (IOError, OSError) as e:
                 logger.warning(f"Error while downloading blob for field {field} in datapoint {dp.path}: {e}")
                 dp.metadata.pop(field, None)
                 return
             dp.metadata[field] = blob_or_path
dagshub/data_engine/annotation/importer.py (2)

97-131: Consider extracting CVAT video detection logic to reduce duplication.

The pattern of calling loader, then checking _is_video_annotation_dict, then optionally flattening is duplicated between cvat (lines 98-102) and cvat_video (lines 124-131) handlers.

♻️ Optional: Extract common CVAT processing logic
def _process_cvat_result(self, result) -> Mapping[str, Sequence[IRAnnotationBase]]:
    if self._is_video_annotation_dict(result):
        return self._flatten_video_annotations(result)
    return result

Then use in both handlers:

elif self.annotations_type == "cvat":
    result = load_cvat_from_zip(annotations_file)
    annotation_dict = self._process_cvat_result(result)

222-226: Assertion may be too strict for edge cases.

The assert self.is_video_format will raise AssertionError if a non-video annotation somehow has None filename. Consider using a more informative exception for production code.

♻️ Suggested improvement
             for ann in anns:
                 if ann.filename is not None:
                     ann.filename = remap_func(ann.filename)
                 else:
-                    assert self.is_video_format, f"Non-video annotation has no filename: {ann}"
+                    if not self.is_video_format:
+                        raise ValueError(f"Non-video annotation unexpectedly has no filename: {ann}")
                     ann.filename = new_filename
tests/data_engine/annotation_import/test_mot.py (1)

108-108: Minor: Use next(iter()) for single-element access.

Static analysis suggests using next(iter(result.values())) instead of list(result.values())[0] for slightly better performance and clarity when accessing the single expected element.

♻️ Suggested fix
-    anns = list(result.values())[0]
+    anns = next(iter(result.values()))

Also applies to: 122-122

tests/data_engine/annotation_import/test_cvat_video.py (2)

35-35: Minor: Use next(iter()) for single-element access.

Same suggestion as in test_mot.py - use next(iter(result.values())) for cleaner single-element access.

♻️ Suggested fix
-    anns = list(result.values())[0]
+    anns = next(iter(result.values()))

151-183: Consider extracting shared test helpers to conftest.py.

The _make_video_bbox and _make_qr helper functions are duplicated across test_mot.py, test_cvat_video.py, and test_coco.py. Consider moving these to a shared conftest.py or test utilities module to reduce duplication.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c964d0 and 98844cc.

📒 Files selected for processing (8)
  • .gitignore
  • dagshub/data_engine/annotation/importer.py
  • dagshub/data_engine/annotation/metadata.py
  • dagshub/data_engine/model/datapoint.py
  • dagshub/data_engine/model/query_result.py
  • tests/data_engine/annotation_import/test_coco.py
  • tests/data_engine/annotation_import/test_cvat_video.py
  • tests/data_engine/annotation_import/test_mot.py
🧰 Additional context used
🧬 Code graph analysis (3)
tests/data_engine/annotation_import/test_mot.py (3)
dagshub/data_engine/annotation/importer.py (3)
  • _is_video_annotation_dict (138-143)
  • is_video_format (68-69)
  • _flatten_video_annotations (145-154)
dagshub/data_engine/client/models.py (1)
  • MetadataSelectFieldSchema (82-101)
dagshub/data_engine/dtypes.py (1)
  • MetadataFieldType (20-36)
tests/data_engine/annotation_import/test_coco.py (3)
dagshub/data_engine/annotation/importer.py (2)
  • AnnotationsNotFoundError (30-32)
  • import_annotations (71-135)
dagshub/data_engine/annotation/metadata.py (2)
  • add_coco_annotation (277-297)
  • value (107-117)
dagshub/data_engine/client/models.py (1)
  • MetadataSelectFieldSchema (82-101)
dagshub/data_engine/annotation/importer.py (1)
dagshub/data_engine/model/datasource_state.py (1)
  • raw_path (99-104)
🪛 Ruff (0.14.14)
tests/data_engine/annotation_import/test_cvat_video.py

[warning] 35-35: Prefer next(iter(result.values())) over single element slice

Replace with next(iter(result.values()))

(RUF015)

tests/data_engine/annotation_import/test_mot.py

[warning] 108-108: Prefer next(iter(result.values())) over single element slice

Replace with next(iter(result.values()))

(RUF015)


[warning] 122-122: Prefer next(iter(result.values())) over single element slice

Replace with next(iter(result.values()))

(RUF015)

tests/data_engine/annotation_import/test_coco.py

[warning] 45-45: Unused function argument: mock_dagshub_auth

(ARG001)


[warning] 64-64: Unused function argument: mock_dagshub_auth

(ARG001)

dagshub/data_engine/model/query_result.py

[warning] 412-412: Do not catch blind exception: Exception

(BLE001)


[warning] 781-781: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 892-892: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 947-947: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 993-993: Avoid specifying long messages outside the exception class

(TRY003)

dagshub/data_engine/annotation/importer.py

[warning] 133-133: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.9)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.13)
🔇 Additional comments (20)
dagshub/data_engine/annotation/metadata.py (2)

23-26: LGTM - VideoRectangle task type registration.

Module-level registration of videorectangle in _task_lookup enables Label Studio video annotation parsing. This follows the established pattern for task type registration.


277-297: LGTM - COCO annotation import with filename normalization.

The implementation correctly loads COCO annotations, normalizes filenames to the datapoint's path, and updates the datapoint state. The local import avoids circular dependency issues.

One observation: all annotations from the COCO JSON are assigned to the current datapoint's path regardless of their original image grouping. This appears intentional for the use case of adding annotations to a single datapoint.

dagshub/data_engine/model/query_result.py (4)

772-784: LGTM - Well-implemented helper methods.

_get_all_video_annotations correctly filters for video-specific annotation types. _resolve_annotation_field provides deterministic field selection via sorted order with proper error handling when no annotation fields exist.


864-910: LGTM - COCO export follows established patterns.

The implementation correctly:

  • Resolves annotation field with fallback
  • Validates non-empty annotations before export
  • Prefixes filenames with source prefix
  • Creates appropriate directory structure

949-957: Dimension inference from first annotation requires verification of external library constraints.

The empty list case is properly guarded at lines 946-947. However, the concern about None dimensions deserves clarification: the code assumes video_annotations[0].image_width and video_annotations[0].image_height are non-None when falling back from the provided parameters. All test cases construct IRVideoBBoxAnnotation with explicit non-None dimensions (e.g., image_width=1920, image_height=1080), but since IRVideoBBoxAnnotation is from the external dagshub_annotation_converter library, its type constraints cannot be verified within this repository. If the external library allows None for these fields, they could propagate to MOTContext.


964-1005: No action needed. The function properly handles None dimensions by inferring them from the video annotation objects, which include width and height metadata. Tests confirm this behavior works as expected.

dagshub/data_engine/annotation/importer.py (3)

67-69: LGTM - Clean video format detection.

The is_video_format property provides a clear way to check if the annotation type is video-oriented.


137-143: LGTM - Robust video annotation detection.

The method correctly handles edge cases: non-dict inputs, empty dicts, and determines video vs image annotations by checking the type of the first key.


371-385: LGTM - Video task conversion with proper filtering.

The method correctly filters for IRVideoBBoxAnnotation instances and skips entries without video annotations, preventing empty tasks from being created.

tests/data_engine/annotation_import/test_mot.py (4)

1-24: LGTM - Well-structured test setup.

Good use of autouse fixture to mock source_prefix for all tests in the module, ensuring consistent test isolation.


29-46: LGTM - Comprehensive edge case coverage for _is_video_annotation_dict.

Tests cover all key scenarios: int keys, str keys, empty dict, non-dict input, and mixed keys (checking first key behavior).


153-180: LGTM - Thorough export tests.

Good coverage of MOT export including directory structure validation, explicit dimension handling, and error cases for missing annotations.


186-236: LGTM - Clean helper functions.

The helper functions are well-designed for test reusability, with sensible defaults and clear parameter names.

tests/data_engine/annotation_import/test_cvat_video.py (2)

43-81: LGTM - Good coverage of video annotation filtering and aggregation.

Tests properly verify that _get_all_video_annotations filters out image annotations and correctly aggregates across multiple datapoints.


87-145: LGTM - Comprehensive CVAT video export tests.

Good coverage of success cases, error handling (no annotations, image-only annotations), custom parameters, and multi-datapoint scenarios.

tests/data_engine/annotation_import/test_coco.py (5)

29-42: LGTM - Solid COCO import tests.

Tests cover successful import and proper error handling for non-existent files.


45-58: LGTM - The unused mock_dagshub_auth parameter is intentional.

The static analysis warning (ARG001) is a false positive. The mock_dagshub_auth fixture is included to ensure authentication is mocked during these tests via pytest's fixture mechanism, even though it's not directly referenced in the test body.

Also applies to: 64-71


77-103: LGTM - Thorough _resolve_annotation_field tests.

Excellent coverage including explicit field override, auto-resolution, error on no fields, and alphabetical ordering verification.


109-182: LGTM - Comprehensive COCO export tests.

Good coverage including:

  • Bbox coordinate verification (denormalized format)
  • Error handling for empty annotations
  • Explicit class mapping
  • Custom output filename
  • Multi-datapoint export with count verification

188-218: Helper functions follow established pattern.

Same comment as for other test files - these helpers could be shared via conftest.py to reduce duplication across the test modules.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@qodo-code-review
Copy link

qodo-code-review bot commented Feb 10, 2026

Code Review by Qodo

🐞 Bugs (5) 📘 Rule violations (5) 📎 Requirement gaps (0)

Grey Divider


Action required

✅ 1. remap_annotations uses assert 📘 Rule violation ⛯ Reliability
Description
remap_annotations() relies on an assert for a runtime data-integrity check when ann.filename
is missing. Assertions can be disabled and do not provide robust, user-actionable error handling for
this edge case.
Code

dagshub/data_engine/annotation/importer.py[R222-226]

+                if ann.filename is not None:
+                    ann.filename = remap_func(ann.filename)
+                else:
+                    assert self.is_video_format, f"Non-video annotation has no filename: {ann}"
+                    ann.filename = new_filename
Evidence
The checklist requires explicit handling of edge cases and meaningful failures; using assert for a
runtime validation can be skipped and is not robust error handling. The added code uses `assert
self.is_video_format` to validate missing filenames rather than raising a controlled exception.

Rule 3: Generic: Robust Error Handling and Edge Case Management
dagshub/data_engine/annotation/importer.py[222-226]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`AnnotationImporter.remap_annotations()` uses an `assert` to validate a runtime edge case (`ann.filename is None`). Asserts can be disabled and are not robust error handling.
## Issue Context
This code path runs when remapping imported annotations to datasource datapoints; missing filenames should be handled predictably with a meaningful error.
## Fix Focus Areas
- dagshub/data_engine/annotation/importer.py[214-227]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


✅ 2. _get_blob breaks get_blob 📘 Rule violation ⛯ Reliability
Description
_get_blob() no longer returns error strings and now raises exceptions directly, but
Datapoint.get_blob() still contains logic expecting _get_blob() may return a string error. This
creates inconsistent behavior and can cause unexpected exceptions to propagate without
context/handling.
Code

dagshub/data_engine/model/datapoint.py[R306-313]

+    for attempt in Retrying(
+        retry=retry_if_exception_type(RuntimeError),
+        stop=stop_after_attempt(5),
+        wait=wait_exponential(multiplier=1, min=4, max=10),
+        before_sleep=before_sleep_log(logger, logging.WARNING),
+    ):
+        with attempt:
+            content = get()
Evidence
The checklist requires robust handling of failure points. _get_blob() now raises on failures
inside the retry loop, while Datapoint.get_blob() still checks if type(content) is str: implying
it expects _get_blob() may return an error string, which is no longer true—leading to unhandled
exception propagation and dead/incorrect logic.

Rule 3: Generic: Robust Error Handling and Edge Case Management
dagshub/data_engine/model/datapoint.py[306-313]
dagshub/data_engine/model/datapoint.py[176-193]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`_get_blob()` now raises exceptions instead of returning error strings, but `Datapoint.get_blob()` still contains code expecting string errors. This can cause unexpected exception propagation and inconsistent error semantics.
## Issue Context
`Datapoint.get_blob()` is a user-facing API surface; errors should include clear context about what failed (which column/blob) and behave consistently.
## Fix Focus Areas
- dagshub/data_engine/model/datapoint.py[176-193]
- dagshub/data_engine/model/datapoint.py[295-314]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. output_filename allows path traversal 📘 Rule violation ⛨ Security
Description
export_as_coco() uses output_filename directly in download_dir / output_filename without
validating it is a simple filename. A caller can supply path separators (e.g., ../x.json) to write
outside the intended export directory.
Code

dagshub/data_engine/model/query_result.py[R906-907]

+        output_path = download_dir / output_filename
+        log_message("Exporting COCO annotations...")
Evidence
The checklist requires security-first input validation for external inputs. output_filename is a
caller-controlled string and is used to construct a filesystem path without sanitization, enabling
writing to unintended locations.

Rule 6: Generic: Security-First Input Validation and Data Handling
dagshub/data_engine/model/query_result.py[906-907]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`export_as_coco()` constructs `output_path` using a caller-provided `output_filename` without validation, allowing path traversal writes outside `download_dir`.
## Issue Context
This is a file write operation and should ensure the output file remains within the intended export directory.
## Fix Focus Areas
- dagshub/data_engine/model/query_result.py[864-910]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (3)
4. Repo import path mismatch 🐞 Bug ✓ Correctness
Description
When importing annotations from the repo, downloads use keep_source_prefix=True (which preserves
remote subdirectories), but import code assumes the downloaded file is at the temp dir root by name
only. If annotations_file includes directories (e.g., "anns/annotations.json"), imports will fail
with FileNotFoundError (new coco/mot/cvat_video paths are affected too).
Code

dagshub/data_engine/annotation/importer.py[R159-162]

+        if self.annotations_type in ("cvat", "cvat_video"):
           repoApi.download(self.annotations_file.as_posix(), dest_dir, keep_source_prefix=True)
       elif self.annotations_type == "yolo":
           # Download the dataset .yaml file and the images + annotations
Evidence
RepoAPI.download with keep_source_prefix=True writes a single requested file into
local_path/remote_path (including remote directories). AnnotationImporter, after download, resets
annotations_file to tmp_dir_path/annotations_file.name, dropping those directories, so the loader is
pointed at a path that doesn't exist when the remote path isn’t a flat filename.

dagshub/data_engine/annotation/importer.py[84-89]
dagshub/data_engine/annotation/importer.py[156-173]
dagshub/common/api/repo.py[443-456]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Repo-based annotation import can fail when the remote annotation path contains directories. `RepoAPI.download(..., keep_source_prefix=True)` preserves the remote directory structure under the temp dir, but the importer later assumes the file is at `tmp_dir/&amp;lt;filename&amp;gt;`.
### Issue Context
This affects newly added formats (COCO/MOT/CVAT video) because they also use `keep_source_prefix=True`, and it can also affect existing types when annotations are stored under subdirectories in the repo.
### Fix Focus Areas
- dagshub/data_engine/annotation/importer.py[84-89]
- dagshub/data_engine/annotation/importer.py[156-173]
- dagshub/common/api/repo.py[443-456]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Export mutates filenames 🐞 Bug ✓ Correctness
Description
export_as_yolo/export_as_coco modify annotation objects in-place by rewriting ann.filename to
include source_prefix. This permanently mutates datapoint metadata (double-prefixing on repeated
exports) and can generate OS-specific separators due to os.path.join with a PurePosixPath prefix.
Code

dagshub/data_engine/model/query_result.py[R898-901]

+        # Add the source prefix to all annotations
+        for ann in annotations:
+            ann.filename = os.path.join(self.datasource.source.source_prefix, ann.filename)
+
Evidence
_get_all_annotations returns live references from dp.metadata[annotation_field].annotations;
export_as_yolo/export_as_coco then overwrites ann.filename on those objects. source_prefix is a
PurePosixPath, so os.path.join may emit platform-specific separators (e.g., backslashes on Windows),
breaking later logic like checking for "images/".

dagshub/data_engine/model/query_result.py[762-770]
dagshub/data_engine/model/query_result.py[846-852]
dagshub/data_engine/model/query_result.py[898-905]
dagshub/data_engine/model/datasource_state.py[106-113]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`export_as_yolo()` / `export_as_coco()` rewrite `ann.filename` in-place, which mutates datapoint metadata and can cause repeated exports to double-prefix paths. Additionally, using `os.path.join()` with a `PurePosixPath` prefix can produce platform-specific separators.
### Issue Context
Annotations come from `_get_all_annotations()`, which returns references to the `MetadataAnnotations.annotations` list stored on datapoints.
### Fix Focus Areas
- dagshub/data_engine/model/query_result.py[762-770]
- dagshub/data_engine/model/query_result.py[846-852]
- dagshub/data_engine/model/query_result.py[898-905]
- dagshub/data_engine/model/datasource_state.py[106-113]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Export crashes on videos 🐞 Bug ✓ Correctness
Description
export_as_yolo/export_as_coco assume every annotation has a non-null filename, but video IR
annotations can have filename unset. If a user accidentally exports a mixed or video-only annotation
field as COCO/YOLO, os.path.join(prefix, None) will raise a TypeError and abort the export.
Code

dagshub/data_engine/model/query_result.py[R898-901]

+        # Add the source prefix to all annotations
+        for ann in annotations:
+            ann.filename = os.path.join(self.datasource.source.source_prefix, ann.filename)
+
Evidence
Video annotations are represented by IRVideoBBoxAnnotation and are constructed without filename in
tests. _get_all_annotations does not filter out non-image annotations and
export_as_coco/export_as_yolo unconditionally join source_prefix with ann.filename, so a None
filename will crash.

tests/data_engine/annotation_import/test_cvat_video.py[151-158]
dagshub/data_engine/model/query_result.py[762-770]
dagshub/data_engine/model/query_result.py[898-901]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
COCO/YOLO export paths assume every annotation has `filename` set. Video annotations can have `filename=None`, causing a `TypeError` during path prefixing if the field contains video annotations.
### Issue Context
With new video support, it&amp;#x27;s easier to end up with video annotations in annotation fields, and `_get_all_annotations()` does not filter types.
### Fix Focus Areas
- dagshub/data_engine/model/query_result.py[762-775]
- dagshub/data_engine/model/query_result.py[846-852]
- dagshub/data_engine/model/query_result.py[898-901]
- tests/data_engine/annotation_import/test_cvat_video.py[151-158]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

7. log_message writes unstructured logs 📘 Rule violation ⛨ Security
Description
New logging added via log_message() emits free-form strings and includes datapoint.path, which
may be sensitive depending on dataset naming conventions. This does not meet the structured,
non-sensitive logging requirement.
Code

dagshub/data_engine/annotation/metadata.py[R296-297]

+        log_message(f"Added {len(new_anns)} COCO annotation(s) to datapoint {self.datapoint.path}")
+        self._update_datapoint()
Evidence
The checklist requires structured logs and avoiding sensitive data in log output. log_message()
prints and logs a plain string, and the new message includes the raw datapoint path, which is
user/data-derived and may contain sensitive identifiers.

Rule 5: Generic: Secure Logging Practices
dagshub/common/helpers.py[100-107]
dagshub/data_engine/annotation/metadata.py[277-297]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`log_message()` produces unstructured, free-form logs and the newly added message logs a raw datapoint path, which may be sensitive.
## Issue Context
The compliance requirement calls for structured logs and avoiding sensitive data in logs.
## Fix Focus Areas
- dagshub/data_engine/annotation/metadata.py[277-297]
- dagshub/common/helpers.py[100-107]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


8. result variable too generic 📘 Rule violation ✓ Correctness
Description
import_annotations() introduces a generic variable name result for loaded CVAT annotations,
which reduces readability and makes intent less self-documenting. More descriptive naming would
better communicate whether this is image annotations vs video frame annotations.
Code

dagshub/data_engine/annotation/importer.py[R98-102]

+                result = load_cvat_from_zip(annotations_file)
+                if self._is_video_annotation_dict(result):
+                    annotation_dict = self._flatten_video_annotations(result)
+                else:
+                    annotation_dict = result
Evidence
The checklist requires meaningful, self-documenting identifiers and flags generic names like
result. The new code stores loaded annotations in a variable named result, which is
non-descriptive in a function that handles multiple annotation formats and shapes.

Rule 2: Generic: Meaningful Naming and Self-Documenting Code
dagshub/data_engine/annotation/importer.py[97-103]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
A generic variable name (`result`) is used for loaded CVAT annotations, reducing readability in multi-format import logic.
## Issue Context
This function handles several formats (YOLO/CVAT/COCO/MOT/CVAT video), so descriptive names help avoid confusion about data shape and semantics.
## Fix Focus Areas
- dagshub/data_engine/annotation/importer.py[90-133]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


9. Video task save corruption 🐞 Bug ✓ Correctness
Description
metadata.py now enables parsing of Label Studio "videorectangle" tasks, but
MetadataAnnotations.to_ls_task always serializes tasks as image tasks (task.data["image"]). Editing
meta/annotations for a video field can therefore reserialize video tasks incorrectly and corrupt
stored annotation blobs.
Code

dagshub/data_engine/annotation/metadata.py[R23-27]

+from dagshub_annotation_converter.formats.label_studio.videorectangle import VideoRectangleAnnotation
+from dagshub_annotation_converter.formats.label_studio.task import task_lookup as _task_lookup
+
+_task_lookup["videorectangle"] = VideoRectangleAnnotation
+
Evidence
The PR registers videorectangle parsing by mutating the converter task_lookup. Any mutation to
MetadataAnnotations.meta triggers _update_datapoint, which calls to_ls_task. to_ls_task always sets
the LS task’s data to an image URL and then adds all IR annotations, with no branching for video
tasks.

dagshub/data_engine/annotation/metadata.py[23-27]
dagshub/data_engine/annotation/metadata.py[91-104]
dagshub/data_engine/annotation/metadata.py[29-37]
dagshub/data_engine/annotation/metadata.py[142-148]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Video Label Studio tasks can now be parsed, but saving/updating `MetadataAnnotations` always serializes as an image task (`task.data[&amp;#x27;image&amp;#x27;]`). Any user mutation (e.g., setting `anns.meta[...]`) will reserialize and potentially corrupt video task blobs.
### Issue Context
`AnnotationMetaDict.__setitem__` triggers `_update_datapoint()`, which calls `to_ls_task()`.
### Fix Focus Areas
- dagshub/data_engine/annotation/metadata.py[23-27]
- dagshub/data_engine/annotation/metadata.py[29-37]
- dagshub/data_engine/annotation/metadata.py[91-104]
- dagshub/data_engine/annotation/metadata.py[142-148]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
10. Blob errors become silent 🐞 Bug ✓ Correctness
Description
On blob download failure, get_blob_fields drops the metadata key and later code fills missing
annotation fields with empty MetadataAnnotations. This can silently turn transient download failures
into empty annotations, making debugging and correctness validation harder.
Code

dagshub/data_engine/model/query_result.py[R410-415]

+            try:
+                blob_or_path = _get_blob(url, blob_path, auth, cache_on_disk, load_into_memory, path_format)
+            except Exception as e:
+                logger.warning(f"Error while downloading blob for field {field} in datapoint {dp.path}: {e}")
+                dp.metadata.pop(field, None)
+                return
Evidence
The new exception handler removes the field from metadata on failure. During annotation conversion,
missing annotation fields are replaced with empty MetadataAnnotations objects, which hides the fact
that a download failed (beyond a warning log) and changes semantics from “failed to load” to “no
annotations”.

dagshub/data_engine/model/query_result.py[409-416]
dagshub/data_engine/model/query_result.py[444-469]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Blob download failures are currently converted into missing metadata keys, and annotation fields then become empty `MetadataAnnotations`, masking errors.
### Issue Context
The code logs a warning but changes runtime semantics from “failed to load blob” to “no annotations”.
### Fix Focus Areas
- dagshub/data_engine/model/query_result.py[409-416]
- dagshub/data_engine/model/query_result.py[444-469]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment on lines +898 to +901
# Add the source prefix to all annotations
for ann in annotations:
ann.filename = os.path.join(self.datasource.source.source_prefix, ann.filename)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

5. Export mutates filenames 🐞 Bug ✓ Correctness

export_as_yolo/export_as_coco modify annotation objects in-place by rewriting ann.filename to
include source_prefix. This permanently mutates datapoint metadata (double-prefixing on repeated
exports) and can generate OS-specific separators due to os.path.join with a PurePosixPath prefix.
Agent Prompt
### Issue description
`export_as_yolo()` / `export_as_coco()` rewrite `ann.filename` in-place, which mutates datapoint metadata and can cause repeated exports to double-prefix paths. Additionally, using `os.path.join()` with a `PurePosixPath` prefix can produce platform-specific separators.

### Issue Context
Annotations come from `_get_all_annotations()`, which returns references to the `MetadataAnnotations.annotations` list stored on datapoints.

### Fix Focus Areas
- dagshub/data_engine/model/query_result.py[762-770]
- dagshub/data_engine/model/query_result.py[846-852]
- dagshub/data_engine/model/query_result.py[898-905]
- dagshub/data_engine/model/datasource_state.py[106-113]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@deanp70
Copy link
Member Author

deanp70 commented Feb 15, 2026

Note to ourselves - need to update the docs:

  1. https://dagshub.com/docs/use_cases/annotation/#supported-formats-for-annotation-imports

To have the correct list of supported formats

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
dagshub/data_engine/model/query_result.py (1)

954-962: Minor redundancy in dimension checks.

The elif video_annotations: checks at lines 957 and 961 are always true since we've already verified video_annotations is non-empty at line 951. This is harmless but slightly redundant.

Suggested simplification
         context = MOTContext()
         if image_width is not None:
             context.image_width = image_width
-        elif video_annotations:
+        else:
             context.image_width = video_annotations[0].image_width
         if image_height is not None:
             context.image_height = image_height
-        elif video_annotations:
+        else:
             context.image_height = video_annotations[0].image_height
tests/data_engine/annotation_import/test_mot.py (1)

100-129: Consider using next(iter(...)) for single-element access.

Static analysis suggests using next(iter(result.values())) instead of list(result.values())[0] for better performance and clarity when accessing a single element.

Suggested fix
-    anns = list(result.values())[0]
+    anns = next(iter(result.values()))
tests/data_engine/annotation_import/test_cvat_video.py (1)

28-39: Consider using next(iter(...)) for single-element access.

Same pattern as MOT tests - next(iter(result.values())) is preferred over list(result.values())[0].

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 98844cc and 898fc22.

📒 Files selected for processing (7)
  • dagshub/data_engine/annotation/importer.py
  • dagshub/data_engine/annotation/metadata.py
  • dagshub/data_engine/model/datapoint.py
  • dagshub/data_engine/model/query_result.py
  • tests/data_engine/annotation_import/test_coco.py
  • tests/data_engine/annotation_import/test_cvat_video.py
  • tests/data_engine/annotation_import/test_mot.py
🧰 Additional context used
🧬 Code graph analysis (3)
dagshub/data_engine/annotation/metadata.py (2)
dagshub/streaming/filesystem.py (1)
  • path (1119-1124)
dagshub/common/helpers.py (1)
  • log_message (100-107)
dagshub/data_engine/annotation/importer.py (2)
dagshub/data_engine/model/datasource.py (1)
  • source (180-181)
dagshub/data_engine/model/datasource_state.py (1)
  • raw_path (99-104)
dagshub/data_engine/model/query_result.py (2)
dagshub/data_engine/model/datapoint.py (1)
  • _get_blob (264-322)
dagshub/common/helpers.py (1)
  • log_message (100-107)
🪛 Ruff (0.15.1)
dagshub/data_engine/annotation/importer.py

[warning] 131-131: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 224-224: Avoid specifying long messages outside the exception class

(TRY003)

tests/data_engine/annotation_import/test_cvat_video.py

[warning] 36-36: Prefer next(iter(result.values())) over single element slice

Replace with next(iter(result.values()))

(RUF015)

dagshub/data_engine/model/query_result.py

[warning] 412-412: Do not catch blind exception: Exception

(BLE001)


[warning] 786-786: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 897-897: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 952-952: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 998-998: Avoid specifying long messages outside the exception class

(TRY003)

tests/data_engine/annotation_import/test_coco.py

[warning] 28-28: Expected a statement

(invalid-syntax)


[warning] 28-28: Expected a statement

(invalid-syntax)


[warning] 28-28: Expected a statement

(invalid-syntax)


[warning] 28-28: Expected a statement

(invalid-syntax)


[warning] 35-35: Expected a statement

(invalid-syntax)


[warning] 35-35: Expected a statement

(invalid-syntax)


[warning] 35-35: Expected a statement

(invalid-syntax)


[warning] 35-35: Expected a statement

(invalid-syntax)


[warning] 35-36: Expected a statement

(invalid-syntax)


[warning] 44-44: Expected a statement

(invalid-syntax)


[warning] 44-44: Expected a statement

(invalid-syntax)


[warning] 44-44: Expected a statement

(invalid-syntax)


[warning] 44-44: Expected a statement

(invalid-syntax)


[warning] 44-44: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


[warning] 44-44: Expected ,, found name

(invalid-syntax)


[warning] 45-45: Unexpected indentation

(invalid-syntax)


[warning] 52-52: Expected a statement

(invalid-syntax)


[warning] 52-52: Expected a statement

(invalid-syntax)


[warning] 52-52: Expected a statement

(invalid-syntax)


[warning] 52-52: Expected a statement

(invalid-syntax)


[warning] 63-63: Expected ,, found ==

(invalid-syntax)


[warning] 63-63: Expected ,, found ==

(invalid-syntax)


[warning] 63-63: Expected ,, found ==

(invalid-syntax)


[warning] 63-63: Expected ,, found =

(invalid-syntax)


[warning] 63-64: Expected ), found newline

(invalid-syntax)


[warning] 69-69: Expected ,, found >>

(invalid-syntax)


[warning] 69-69: Expected ,, found >>

(invalid-syntax)


[warning] 69-69: Expected ,, found >>

(invalid-syntax)


[warning] 69-69: Expected ,, found >

(invalid-syntax)


[warning] 69-69: Positional argument cannot follow keyword argument

(invalid-syntax)


[warning] 69-69: Expected ,, found name

(invalid-syntax)


[warning] 69-69: Positional argument cannot follow keyword argument

(invalid-syntax)


[warning] 69-69: Expected ,, found name

(invalid-syntax)


[warning] 70-70: Expected ,, found name

(invalid-syntax)


[warning] 74-74: Expected a statement

(invalid-syntax)


[warning] 74-74: Expected a statement

(invalid-syntax)


[warning] 74-74: Expected a statement

(invalid-syntax)


[warning] 74-74: Expected a statement

(invalid-syntax)


[warning] 75-75: Expected a statement

(invalid-syntax)


[warning] 75-75: Expected a statement

(invalid-syntax)


[warning] 75-75: Expected a statement

(invalid-syntax)


[warning] 75-75: Expected a statement

(invalid-syntax)


[warning] 75-76: Expected a statement

(invalid-syntax)


[warning] 77-77: Expected a statement

(invalid-syntax)


[warning] 77-77: Expected a statement

(invalid-syntax)


[warning] 77-77: Expected a statement

(invalid-syntax)


[warning] 77-77: Expected a statement

(invalid-syntax)


[warning] 77-77: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


[warning] 77-77: Expected ,, found name

(invalid-syntax)


[warning] 78-78: Unexpected indentation

(invalid-syntax)


[warning] 83-83: Expected a statement

(invalid-syntax)


[warning] 83-83: Expected a statement

(invalid-syntax)


[warning] 83-83: Expected a statement

(invalid-syntax)


[warning] 83-83: Expected a statement

(invalid-syntax)


[warning] 84-84: Unexpected indentation

(invalid-syntax)


[warning] 85-85: Expected a statement

(invalid-syntax)


[warning] 85-85: Expected a statement

(invalid-syntax)


[warning] 85-85: Expected a statement

(invalid-syntax)


[warning] 85-85: Expected a statement

(invalid-syntax)


[warning] 85-86: Expected a statement

(invalid-syntax)


[warning] 86-86: Expected a statement

(invalid-syntax)


[warning] 86-86: Expected a statement

(invalid-syntax)


[warning] 86-86: Expected a statement

(invalid-syntax)


[warning] 86-86: Expected a statement

(invalid-syntax)


[warning] 86-86: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


[warning] 86-86: Expected ,, found name

(invalid-syntax)


[warning] 92-92: Expected a statement

(invalid-syntax)


[warning] 92-92: Expected a statement

(invalid-syntax)


[warning] 92-92: Expected a statement

(invalid-syntax)


[warning] 92-92: Expected a statement

(invalid-syntax)


[warning] 95-95: Expected a statement

(invalid-syntax)


[warning] 95-95: Expected a statement

(invalid-syntax)


[warning] 95-95: Expected a statement

(invalid-syntax)


[warning] 95-95: Expected a statement

(invalid-syntax)


[warning] 95-96: Expected a statement

(invalid-syntax)


[warning] 98-98: Expected a statement

(invalid-syntax)


[warning] 98-98: Expected a statement

(invalid-syntax)


[warning] 98-98: Expected a statement

(invalid-syntax)


[warning] 98-98: Expected a statement

(invalid-syntax)


[warning] 98-98: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


[warning] 98-98: Expected ,, found name

(invalid-syntax)


[warning] 99-99: Unexpected indentation

(invalid-syntax)


[warning] 104-104: Expected a statement

(invalid-syntax)


[warning] 104-104: Expected a statement

(invalid-syntax)


[warning] 104-104: Expected a statement

(invalid-syntax)


[warning] 104-104: Expected a statement

(invalid-syntax)


[warning] 105-105: Unexpected indentation

(invalid-syntax)


[warning] 106-106: Expected a statement

(invalid-syntax)


[warning] 106-106: Expected a statement

(invalid-syntax)


[warning] 106-106: Expected a statement

(invalid-syntax)


[warning] 106-106: Expected a statement

(invalid-syntax)


[warning] 106-107: Expected a statement

(invalid-syntax)


[warning] 107-107: Unexpected indentation

(invalid-syntax)


[warning] 110-110: Expected a statement

(invalid-syntax)


[warning] 123-123: Expected a statement

(invalid-syntax)


[warning] 123-123: Expected a statement

(invalid-syntax)


[warning] 123-123: Expected a statement

(invalid-syntax)


[warning] 123-123: Expected a statement

(invalid-syntax)


[warning] 123-123: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


[warning] 123-123: Expected ,, found name

(invalid-syntax)


[warning] 129-129: Expected a statement

(invalid-syntax)


[warning] 129-129: Expected a statement

(invalid-syntax)


[warning] 129-129: Expected a statement

(invalid-syntax)


[warning] 129-129: Expected a statement

(invalid-syntax)


[warning] 131-131: Expected an indented block after function definition

(invalid-syntax)


[warning] 131-131: Expected a statement

(invalid-syntax)


[warning] 131-131: Expected a statement

(invalid-syntax)


[warning] 131-131: Expected a statement

(invalid-syntax)


[warning] 131-132: Expected a statement

(invalid-syntax)


[warning] 133-133: Expected an indented block after function definition

(invalid-syntax)


[warning] 133-133: Expected a statement

(invalid-syntax)


[warning] 133-133: Expected a statement

(invalid-syntax)


[warning] 133-133: Expected a statement

(invalid-syntax)


[warning] 133-133: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


[warning] 133-133: Expected ,, found name

(invalid-syntax)


[warning] 134-134: Unexpected indentation

(invalid-syntax)


[warning] 138-138: Expected a statement

(invalid-syntax)


[warning] 138-138: Expected a statement

(invalid-syntax)


[warning] 138-138: Expected a statement

(invalid-syntax)


[warning] 138-138: Expected a statement

(invalid-syntax)


[warning] 140-140: Expected an indented block after function definition

(invalid-syntax)


[warning] 140-140: Expected a statement

(invalid-syntax)


[warning] 140-140: Expected a statement

(invalid-syntax)


[warning] 140-140: Expected a statement

(invalid-syntax)


[warning] 140-141: Expected a statement

(invalid-syntax)


[warning] 142-142: Expected an indented block after function definition

(invalid-syntax)


[warning] 142-142: Expected a statement

(invalid-syntax)


[warning] 142-142: Expected a statement

(invalid-syntax)


[warning] 142-142: Expected a statement

(invalid-syntax)


[warning] 142-142: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


[warning] 142-142: Expected ,, found name

(invalid-syntax)


[warning] 143-143: Unexpected indentation

(invalid-syntax)


[warning] 147-147: Expected a statement

(invalid-syntax)


[warning] 147-147: Expected a statement

(invalid-syntax)


[warning] 147-147: Expected a statement

(invalid-syntax)


[warning] 147-147: Expected a statement

(invalid-syntax)


[warning] 149-149: Expected an indented block after function definition

(invalid-syntax)


[warning] 149-149: Expected a statement

(invalid-syntax)


[warning] 149-149: Expected a statement

(invalid-syntax)


[warning] 149-149: Expected a statement

(invalid-syntax)


[warning] 149-150: Expected a statement

(invalid-syntax)


[warning] 151-151: Expected an indented block after function definition

(invalid-syntax)


[warning] 151-151: Expected a statement

(invalid-syntax)


[warning] 151-151: Expected a statement

(invalid-syntax)


[warning] 151-151: Expected a statement

(invalid-syntax)


[warning] 151-151: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


[warning] 151-151: Expected ,, found name

(invalid-syntax)


[warning] 152-152: Unexpected indentation

(invalid-syntax)


[warning] 157-157: Expected a statement

(invalid-syntax)


[warning] 157-157: Expected a statement

(invalid-syntax)


[warning] 157-157: Expected a statement

(invalid-syntax)


[warning] 157-157: Expected a statement

(invalid-syntax)


[warning] 175-175: Expected an indented block after function definition

(invalid-syntax)


[warning] 175-175: Expected a statement

(invalid-syntax)


[warning] 175-175: Expected a statement

(invalid-syntax)


[warning] 175-175: Expected a statement

(invalid-syntax)


[warning] 175-176: Expected a statement

(invalid-syntax)


[warning] 180-180: Expected an indented block after function definition

(invalid-syntax)


[warning] 180-180: Expected a statement

(invalid-syntax)


[warning] 180-180: Expected a statement

(invalid-syntax)


[warning] 180-180: Expected a statement

(invalid-syntax)


[warning] 180-180: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


[warning] 180-180: Expected ,, found name

(invalid-syntax)


[warning] 181-181: Unexpected indentation

(invalid-syntax)


[warning] 194-194: Expected a statement

(invalid-syntax)


[warning] 194-194: Expected a statement

(invalid-syntax)


[warning] 194-194: Expected a statement

(invalid-syntax)


[warning] 194-194: Expected a statement

(invalid-syntax)


[warning] 195-195: Unexpected indentation

(invalid-syntax)


[warning] 199-199: Expected a statement

(invalid-syntax)


[warning] 200-200: Expected an indented block after function definition

(invalid-syntax)


[warning] 200-200: Expected a statement

(invalid-syntax)


[warning] 200-200: Expected a statement

(invalid-syntax)


[warning] 200-200: Expected a statement

(invalid-syntax)


[warning] 200-201: Expected a statement

(invalid-syntax)


[warning] 201-201: Unexpected indentation

(invalid-syntax)


[warning] 208-208: Expected a statement

(invalid-syntax)


[warning] 228-228: Expected an indented block after function definition

(invalid-syntax)


[warning] 228-228: Expected a statement

(invalid-syntax)


[warning] 228-228: Expected a statement

(invalid-syntax)


[warning] 228-228: Expected a statement

(invalid-syntax)


[warning] 228-228: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


[warning] 228-228: Expected ,, found name

(invalid-syntax)


[warning] 229-229: Unexpected indentation

(invalid-syntax)


[warning] 237-237: Expected a statement

(invalid-syntax)


[warning] 237-237: Expected a statement

(invalid-syntax)


[warning] 237-237: Expected a statement

(invalid-syntax)


[warning] 237-237: Expected a statement

(invalid-syntax)


[warning] 288-288: Expected a statement

(invalid-syntax)


[warning] 288-288: Expected a statement

(invalid-syntax)


[warning] 288-288: Expected a statement

(invalid-syntax)


[warning] 288-288: Expected a statement

(invalid-syntax)


[warning] 288-289: Expected a statement

(invalid-syntax)


[warning] 309-309: Expected a statement

(invalid-syntax)


[warning] 309-309: Expected a statement

(invalid-syntax)


[warning] 309-309: Expected a statement

(invalid-syntax)


[warning] 309-309: Expected a statement

(invalid-syntax)


[warning] 309-309: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


[warning] 309-309: Expected ,, found name

(invalid-syntax)


[warning] 320-320: Expected a statement

(invalid-syntax)


[warning] 320-320: Expected a statement

(invalid-syntax)


[warning] 320-320: Expected a statement

(invalid-syntax)


[warning] 320-320: Expected a statement

(invalid-syntax)


[warning] 334-334: Expected a statement

(invalid-syntax)


[warning] 334-334: Expected a statement

(invalid-syntax)


[warning] 334-334: Expected a statement

(invalid-syntax)


[warning] 334-334: Expected a statement

(invalid-syntax)


[warning] 334-335: Expected a statement

(invalid-syntax)


[warning] 335-335: Expected a statement

(invalid-syntax)


[warning] 335-335: Expected a statement

(invalid-syntax)


[warning] 335-335: Expected a statement

(invalid-syntax)


[warning] 335-335: Expected a statement

(invalid-syntax)


[warning] 335-335: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


[warning] 335-335: Expected ,, found name

(invalid-syntax)

tests/data_engine/annotation_import/test_mot.py

[warning] 108-108: Prefer next(iter(result.values())) over single element slice

Replace with next(iter(result.values()))

(RUF015)


[warning] 127-127: Prefer next(iter(result.values())) over single element slice

Replace with next(iter(result.values()))

(RUF015)

🔇 Additional comments (15)
dagshub/data_engine/annotation/metadata.py (1)

282-302: LGTM!

The add_coco_annotation method correctly loads COCO annotations, rewrites filenames to match the datapoint path, and triggers the datapoint update. The lazy import helps avoid circular dependencies.

dagshub/data_engine/model/query_result.py (4)

410-414: Catching broad Exception is appropriate here given the retry behavior.

The broad exception catch handles both the original exception and potential RetryError from tenacity. However, this would be cleaner if reraise=True was added to the Retrying call in _get_blob, allowing a more specific catch. The current approach works as a defensive measure.


777-789: LGTM!

The helper methods are well-designed. _get_all_video_annotations correctly filters for video annotations, and _resolve_annotation_field provides deterministic field selection with proper error handling.


869-915: LGTM!

The export_as_coco method follows established patterns from export_as_yolo, with proper file downloads, annotation field resolution, and category handling.


969-1010: LGTM!

The export_as_cvat_video method follows the established export pattern with proper annotation field resolution and error handling.

dagshub/data_engine/annotation/importer.py (5)

67-69: LGTM!

The is_video_format property correctly identifies video-oriented annotation types.


97-131: LGTM!

The format-specific handling is well-organized with proper branching for different input types (zip vs directory/file) and video annotation flattening.


135-152: LGTM!

The helper methods correctly detect video annotation dictionaries and flatten frame-indexed annotations appropriately.


220-225: LGTM!

Good defensive handling for video annotations where ann.filename may be None, falling back to new_filename with proper validation for non-video formats.


370-384: LGTM!

The video task conversion correctly filters for video annotations and uses the appropriate converter. Taking only the first task is acceptable since annotations are grouped by filename.

tests/data_engine/annotation_import/test_mot.py (2)

1-24: LGTM!

Good test setup with proper fixtures for mocking source prefix.


134-172: LGTM!

Good test coverage for MOT export including success cases, error handling for missing annotations, and explicit dimension handling.

tests/data_engine/annotation_import/test_cvat_video.py (3)

1-23: LGTM!

Proper test setup mirroring the MOT test structure.


73-86: LGTM!

The Label Studio round-trip test is valuable for validating that video annotations can be serialized and deserialized correctly.


92-127: LGTM!

Good test coverage for CVAT video export including success cases, error handling, and custom video name propagation.

Comment on lines 23 to 31
from dagshub_annotation_converter.formats.label_studio.videorectangle import VideoRectangleAnnotation
from dagshub_annotation_converter.formats.label_studio.task import task_lookup as _task_lookup

_task_lookup["videorectangle"] = VideoRectangleAnnotation

from dagshub_annotation_converter.formats.label_studio.videorectangle import VideoRectangleAnnotation
from dagshub_annotation_converter.formats.label_studio.task import task_lookup as _task_lookup

_task_lookup["videorectangle"] = VideoRectangleAnnotation
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove duplicate imports and registration.

Lines 23-26 and 28-31 are exact duplicates. This appears to be a merge conflict artifact or copy-paste error that needs to be cleaned up.

Proposed fix
-from dagshub_annotation_converter.formats.label_studio.videorectangle import VideoRectangleAnnotation
-from dagshub_annotation_converter.formats.label_studio.task import task_lookup as _task_lookup
-
-_task_lookup["videorectangle"] = VideoRectangleAnnotation
-
 from dagshub_annotation_converter.formats.label_studio.videorectangle import VideoRectangleAnnotation
 from dagshub_annotation_converter.formats.label_studio.task import task_lookup as _task_lookup

 _task_lookup["videorectangle"] = VideoRectangleAnnotation
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from dagshub_annotation_converter.formats.label_studio.videorectangle import VideoRectangleAnnotation
from dagshub_annotation_converter.formats.label_studio.task import task_lookup as _task_lookup
_task_lookup["videorectangle"] = VideoRectangleAnnotation
from dagshub_annotation_converter.formats.label_studio.videorectangle import VideoRectangleAnnotation
from dagshub_annotation_converter.formats.label_studio.task import task_lookup as _task_lookup
_task_lookup["videorectangle"] = VideoRectangleAnnotation
from dagshub_annotation_converter.formats.label_studio.videorectangle import VideoRectangleAnnotation
from dagshub_annotation_converter.formats.label_studio.task import task_lookup as _task_lookup
_task_lookup["videorectangle"] = VideoRectangleAnnotation

Comment on lines 28 to 44
<<<<<<< HEAD
# --- import ---


def test_import_coco_from_file(ds, tmp_path):
_write_coco(tmp_path, _make_coco_json())
importer = AnnotationImporter(ds, "coco", tmp_path / "annotations.json", load_from="disk")
=======
# --- COCO import ---


def test_import_coco_from_file(ds, tmp_path):
coco_file = tmp_path / "annotations.json"
coco_file.write_text(json.dumps(_make_coco_json()))

importer = AnnotationImporter(ds, "coco", coco_file, load_from="disk")
>>>>>>> 550a453 (initial commite)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unresolved merge conflicts must be resolved before merging.

This file contains git merge conflict markers (<<<<<<< HEAD, =======, >>>>>>> 550a453) throughout. The file has invalid Python syntax and will fail to parse/execute. Please resolve all merge conflicts.

Affected regions include lines 28-44, 52-77, 74-86, 92-107, 129-155, 157-180, 194-228, 237-335, and more.

🧰 Tools
🪛 Ruff (0.15.1)

[warning] 28-28: Expected a statement

(invalid-syntax)


[warning] 28-28: Expected a statement

(invalid-syntax)


[warning] 28-28: Expected a statement

(invalid-syntax)


[warning] 28-28: Expected a statement

(invalid-syntax)


[warning] 35-35: Expected a statement

(invalid-syntax)


[warning] 35-35: Expected a statement

(invalid-syntax)


[warning] 35-35: Expected a statement

(invalid-syntax)


[warning] 35-35: Expected a statement

(invalid-syntax)


[warning] 35-36: Expected a statement

(invalid-syntax)


[warning] 44-44: Expected a statement

(invalid-syntax)


[warning] 44-44: Expected a statement

(invalid-syntax)


[warning] 44-44: Expected a statement

(invalid-syntax)


[warning] 44-44: Expected a statement

(invalid-syntax)


[warning] 44-44: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


[warning] 44-44: Expected ,, found name

(invalid-syntax)

# Conflicts:
#	dagshub/data_engine/annotation/importer.py
#	dagshub/data_engine/annotation/metadata.py
#	dagshub/data_engine/model/datapoint.py
#	dagshub/data_engine/model/query_result.py
#	tests/data_engine/annotation_import/test_cvat_video.py
#	tests/data_engine/annotation_import/test_mot.py
@deanp70 deanp70 force-pushed the video_coco_converters branch from 55dc220 to 89d6395 Compare February 22, 2026 15:28
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
tests/data_engine/annotation_import/test_mot.py (1)

107-109: Consider using next(iter(...)) for single-element access.

For accessing the first (and only expected) element from a dict, next(iter(result.values())) is more idiomatic and avoids creating an intermediate list.

♻️ Proposed fix
-    anns = list(result.values())[0]
+    anns = next(iter(result.values()))

And similarly at line 122:

-    assert len(list(result.values())[0]) == 2
+    assert len(next(iter(result.values()))) == 2

Also applies to: 121-122

dagshub/data_engine/model/query_result.py (1)

1117-1120: Dead variable ann_path.

The variable ann_path on line 1117 is assigned but only used to extract .name and .stem. However, ann_filename is a string and you could use Path(ann_filename).name and Path(ann_filename).stem directly. This isn't a bug, but the assignment could be simplified.

♻️ Suggested simplification
                     assert ann_filename is not None
                     local_video = self._prepare_video_file_for_export(local_download_root, ann_filename)
                     if local_video is None:
                         raise FileNotFoundError(
                             f"Could not find local downloaded video file for '{ann_filename}' "
                             f"under '{local_download_root}'."
                         )
-                    ann_path = Path(ann_filename)
                     video_files[ann_filename] = local_video
-                    video_files[ann_path.name] = local_video
-                    video_files[ann_path.stem] = local_video
+                    video_files[Path(ann_filename).name] = local_video
+                    video_files[Path(ann_filename).stem] = local_video
tests/data_engine/annotation_import/test_cvat_video.py (2)

235-236: Use raw string for regex pattern in match=.

The pattern "missing.mp4" contains a dot which is a regex metacharacter matching any character. While it works here, using a raw string makes the intent clearer.

♻️ Proposed fix
-    with pytest.raises(FileNotFoundError, match="missing.mp4"):
+    with pytest.raises(FileNotFoundError, match=r"missing\.mp4"):

36-36: Consider using next(iter(...)) for single-element access.

Same as in test_mot.py, next(iter(result.values())) is more idiomatic than list(result.values())[0].

♻️ Proposed fix
-    anns = list(result.values())[0]
+    anns = next(iter(result.values()))
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 98844cc and 9716c48.

📒 Files selected for processing (8)
  • dagshub/auth/token_auth.py
  • dagshub/data_engine/annotation/importer.py
  • dagshub/data_engine/annotation/metadata.py
  • dagshub/data_engine/model/query_result.py
  • tests/data_engine/annotation_import/test_cvat_video.py
  • tests/data_engine/annotation_import/test_mot.py
  • tests/data_engine/conftest.py
  • tests/mocks/repo_api.py
✅ Files skipped from review due to trivial changes (1)
  • dagshub/auth/token_auth.py
🧰 Additional context used
🧬 Code graph analysis (2)
tests/data_engine/conftest.py (1)
dagshub/data_engine/client/models.py (3)
  • DatasourceType (39-42)
  • MetadataSelectFieldSchema (82-101)
  • PreprocessingStatus (32-36)
dagshub/data_engine/annotation/metadata.py (1)
dagshub/common/helpers.py (1)
  • log_message (100-107)
🪛 Ruff (0.15.1)
tests/data_engine/annotation_import/test_mot.py

[warning] 108-108: Prefer next(iter(result.values())) over single element slice

Replace with next(iter(result.values()))

(RUF015)


[warning] 122-122: Prefer next(iter(result.values())) over single element slice

Replace with next(iter(result.values()))

(RUF015)


[warning] 190-190: Unused function argument: self

(ARG001)


[warning] 190-190: Unused function argument: args

(ARG001)


[warning] 190-190: Unused function argument: kwargs

(ARG001)


[warning] 195-195: Unused function argument: video_annotations

(ARG001)


[warning] 195-195: Unused function argument: context

(ARG001)


[warning] 195-195: Unused function argument: video_file

(ARG001)


[warning] 223-223: Unused function argument: video_annotations

(ARG001)


[warning] 223-223: Unused function argument: video_file

(ARG001)


[warning] 268-268: Unused function argument: self

(ARG001)


[warning] 268-268: Unused function argument: args

(ARG001)


[warning] 268-268: Unused function argument: kwargs

(ARG001)


[warning] 274-274: Unused function argument: video_annotations

(ARG001)


[warning] 274-274: Unused function argument: context

(ARG001)


[warning] 274-274: Unused function argument: video_files

(ARG001)


[warning] 308-308: Unused function argument: self

(ARG001)


[warning] 308-308: Unused function argument: args

(ARG001)


[warning] 308-308: Unused function argument: kwargs

(ARG001)


[warning] 314-314: Unused function argument: video_annotations

(ARG001)


[warning] 314-314: Unused function argument: context

(ARG001)

tests/data_engine/annotation_import/test_cvat_video.py

[warning] 36-36: Prefer next(iter(result.values())) over single element slice

Replace with next(iter(result.values()))

(RUF015)


[warning] 91-91: Unused function argument: self

(ARG001)


[warning] 91-91: Unused function argument: args

(ARG001)


[warning] 91-91: Unused function argument: kwargs

(ARG001)


[warning] 119-119: Unused function argument: self

(ARG001)


[warning] 119-119: Unused function argument: args

(ARG001)


[warning] 119-119: Unused function argument: kwargs

(ARG001)


[warning] 161-161: Unused function argument: self

(ARG001)


[warning] 161-161: Unused function argument: args

(ARG001)


[warning] 161-161: Unused function argument: kwargs

(ARG001)


[warning] 189-189: Unused function argument: self

(ARG001)


[warning] 189-189: Unused function argument: args

(ARG001)


[warning] 189-189: Unused function argument: kwargs

(ARG001)


[warning] 196-196: Unused function argument: video_annotations

(ARG001)


[warning] 198-198: Unused function argument: video_name

(ARG001)


[warning] 199-199: Unused function argument: image_width

(ARG001)


[warning] 200-200: Unused function argument: image_height

(ARG001)


[warning] 229-229: Unused function argument: self

(ARG001)


[warning] 229-229: Unused function argument: args

(ARG001)


[warning] 229-229: Unused function argument: kwargs

(ARG001)


[warning] 235-235: Pattern passed to match= contains metacharacters but is neither escaped nor raw

(RUF043)

dagshub/data_engine/model/query_result.py

[warning] 811-811: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 820-820: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 931-931: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 988-988: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 1018-1021: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 1034-1034: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 1037-1040: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 1082-1082: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 1113-1116: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 1137-1137: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 1140-1143: Avoid specifying long messages outside the exception class

(TRY003)

dagshub/data_engine/annotation/importer.py

[warning] 161-161: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 289-289: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.9)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.13)
  • GitHub Check: build (3.10)
🔇 Additional comments (14)
tests/mocks/repo_api.py (1)

116-118: LGTM!

The new id property follows the existing mock pattern and provides a simple constant return value appropriate for testing.

tests/data_engine/conftest.py (1)

8-8: LGTM!

The DatasourceType import and source_type assignment properly enhance the mock datasource fixture to represent repository-backed sources, providing more complete test coverage for the new annotation workflows.

Also applies to: 28-29

dagshub/data_engine/annotation/metadata.py (2)

25-28: LGTM!

The VideoRectangleAnnotation import and task lookup registration enables parsing of video rectangle Label Studio tasks, which is needed for the new video annotation support.


279-299: No action required—discarding the second return value is intentional.

The load_coco_from_json_string function returns a tuple (grouped, context). The individual annotation objects in grouped already embed their category information (e.g., categories={"cat": 1.0}), so the global category metadata (second return value) is not needed. This pattern is consistent throughout the codebase (e.g., load_coco_from_file in importer.py also discards it), and tests confirm the method works correctly without accessing the second return value.

tests/data_engine/annotation_import/test_mot.py (1)

1-372: LGTM - Comprehensive test coverage for MOT format.

The test file provides thorough coverage of MOT import/export functionality including:

  • Video annotation dict detection and flattening
  • Import from directory and zip archives
  • Datasource path handling
  • Label Studio task conversion
  • Export with proper directory structure and dimensions handling
  • Multi-video export scenarios
dagshub/data_engine/annotation/importer.py (4)

68-70: LGTM!

Clean property to identify video formats, centralizing the format check logic.


165-171: LGTM!

Good defensive handling for the video annotation detection - checking dict type, non-empty, and using first key type to distinguish frame-indexed (int keys) from file-indexed (str keys) annotations.


285-290: Appropriate handling of None filenames for video formats.

The fallback to new_filename for video formats with None annotation filenames is correct, as video annotations may not have per-frame filenames set.


447-449: The video_ir_to_ls_video_tasks function is from an undocumented/internal API in dagshub-annotation-converter (version ≥0.1.12); its behavior cannot be verified from public documentation.

The function being called is not part of the public API documentation, and there are no tests or comments in this codebase explaining why only the first task is used. Verify with the annotation-converter maintainers whether this function:

  • Always returns exactly one task per video file
  • Can return multiple tasks (and if so, whether taking [0] is intentional or a data loss issue)
dagshub/data_engine/model/query_result.py (4)

787-813: LGTM!

Well-designed helper methods for video export:

  • _get_all_video_annotations properly filters for IRVideoBBoxAnnotation
  • _prepare_video_file_for_export handles both direct path and source-prefix-prefixed paths
  • _get_annotation_filename correctly handles None, single, and list filename cases

815-823: LGTM!

Clean helper that centralizes annotation field resolution logic and improves maintainability of the export methods.


903-949: LGTM!

The COCO export follows the established pattern from YOLO export, with proper handling of source prefix, optional class mapping, and appropriate logging.


951-1050: LGTM - Well-structured video export methods.

Both export_as_mot and export_as_cvat_video follow consistent patterns:

  • Properly resolve annotation field
  • Handle single vs multi-source scenarios
  • Download videos when dimensions are missing for converter probing
  • Use appropriate converter functions
  • Provide clear error messages for missing files

Also applies to: 1052-1161

tests/data_engine/annotation_import/test_cvat_video.py (1)

1-276: LGTM - Thorough test coverage for CVAT video format.

Excellent coverage including:

  • Import from XML
  • Video annotation filtering and aggregation
  • Export with XML structure verification
  • Error handling for no/image-only annotations
  • Custom video name propagation
  • Multi-datapoint scenarios
  • Missing dimension and file handling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant