Skip to content

fix: reject orphan identity references in update-pose/update-labels#409

Open
keithshep wants to merge 2 commits into
mainfrom
fix/label-remap-orphan-identity
Open

fix: reject orphan identity references in update-pose/update-labels#409
keithshep wants to merge 2 commits into
mainfrom
fix/label-remap-orphan-identity

Conversation

@keithshep

Copy link
Copy Markdown
Contributor

Summary

When a JABS project's annotation file references an identity index that the corresponding pose file no longer supplies (typically because the pose was regenerated with fewer identities and stale label entries were never cleaned out), jabs-cli update-pose and jabs-cli update-labels would crash mid-batch with an unhandled ValueError from pose_est_v8.get_bounding_boxes, after partial work had already been staged.

This PR makes both subcommands detect every affected (video, identity) at preflight and raise a single descriptive error before any backup or mutation. The error message also points users at the new --tolerate-orphan-identities escape hatch, which converts the abort into a stderr warning and lets the remap loop skip the affected (identity, behavior) pairs instead of failing.

What changed

  • Strict preflight (default behavior):

    • New _orphan_identities_in_annotation helper scans a JABS annotation JSON for identity keys (in both labels/unfragmented_labels and identity-scoped timeline annotations) that exceed the corresponding pose's num_identities.
    • New _handle_orphan_identities helper raises a single composite ValueError listing every offending (video, identity) across the whole run, with a clear remediation message.
    • _validate_pose_file now returns (format_major_version, num_identities) so preflight can do the orphan scan in the same pass over the project.
    • Both _preflight_update_inputs (update-pose) and _preflight_label_update_inputs (update-labels) collect orphan issues across their respective source annotations and call the handler before any backup or staging.
  • --tolerate-orphan-identities escape hatch:

    • When set, _handle_orphan_identities prints the same comprehensive listing to stderr as a WARNING: and returns without raising.
    • An inline per-(identity, behavior) guard inside _remap_labels_for_video warns and skips orphan pairs as the remap loop encounters them, so the algorithm doesn't crash. With the flag off this guard is unreachable (preflight catches first); it's load-bearing only in tolerate mode.
    • Orphan-skipped blocks are counted into skipped_count in the final summary. No failure annotation is written for them, because the root cause is in the source project and target-side annotations give the user no leverage on the fix.
    • Available on both subcommands; same parameter name, same semantics. The strict-mode error message tells users about the flag so it's discoverable.

User-visible behavior

Default (no flag): preflight aborts on orphan identities with a message like:

Source project has orphan identity references that cannot be matched against pose data:
  cage_2454.2025-02-14.06.45.mp4: source pose has 2 identities (valid indices 0 to 1) but annotation references identity 2

These identity indices have no corresponding entries in the pose file. Typically
the pose was regenerated with fewer identities, leaving stale label entries that
can no longer be matched against pose data.

Clean up the source annotation files (drop the orphan identity entries from the
affected .json files) and re-run, or pass --tolerate-orphan-identities to skip
the affected entries instead.

With --tolerate-orphan-identities: preflight emits the same listing as a stderr WARNING and proceeds. Each orphan (identity, behavior) pair gets a one-line warning when the remap loop reaches it; their blocks land in the final skipped count.

Tests

  • 5 new unit tests for _orphan_identities_in_annotation covering label-track keys, identity-scoped timeline annotations, clean files, missing files, and deduplication.
  • 3 new unit tests for _handle_orphan_identities covering the strict raise (including that the message mentions the flag), the tolerate warning, and the empty-issues no-op.
  • New preflight tests for both subcommands covering strict raise and tolerate-warn behavior.
  • New _remap_labels_for_video regression test for the inline orphan guard.
  • Existing CLI subcommand tests updated to verify the new flag is forwarded.
  • All fake_open_pose_file / _stub_pose_reader test helpers updated to expose num_identities for the new _validate_pose_file return type.

Docs

Both copies of cli-tools.md (docs/user-guide/ and src/jabs/resources/docs/user_guide/) describe the new strict orphan-identity behavior and the --tolerate-orphan-identities flag in each subcommand section.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents jabs-cli update-pose and jabs-cli update-labels from crashing mid-run when annotation JSONs reference identity indices that no longer exist in the corresponding pose files, by adding a preflight orphan-identity scan plus an optional --tolerate-orphan-identities warning-and-skip mode.

Changes:

  • Add orphan-identity detection + consolidated reporting during preflight for both subcommands.
  • Add --tolerate-orphan-identities to allow continuing with warnings and skipping orphan (identity, behavior) pairs during remap.
  • Add/adjust unit tests and update user-guide docs to cover the new behavior and flag.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/jabs/scripts/cli/update_pose.py Implements orphan detection/reporting and tolerate-mode remap skipping; extends pose validation to return num_identities.
src/jabs/scripts/cli/update_labels.py Wires the same orphan preflight behavior and CLI flag into label import/update.
tests/project/test_update_pose.py Adds unit/regression tests for orphan detection/handling and the inline remap guard; updates stubs for num_identities.
tests/project/test_update_labels.py Adds preflight tests for strict/tolerate orphan handling; updates stubs and CLI forwarding assertions.
docs/user-guide/cli-tools.md Documents strict orphan preflight behavior and the new tolerate flag for both commands.
src/jabs/resources/docs/user_guide/cli-tools.md Same documentation updates for the packaged user guide copy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +526 to +533
label_key = "unfragmented_labels" if "unfragmented_labels" in data else "labels"
for identity_key in data.get(label_key, {}):
try:
identity_index = int(identity_key)
except (TypeError, ValueError):
continue
if identity_index >= num_identities:
orphans.add(identity_index)
Comment on lines +576 to +582
for video, num_identities, orphans in issues:
orphans_str = ", ".join(str(i) for i in orphans)
lines.append(
f" {video}: {source_label} pose has {num_identities} identities "
f"(valid indices 0 to {num_identities - 1}) but annotation references "
f"identity {orphans_str}"
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants