variations in eval runner#779
Conversation
Introduce the variation framework: a VariationBase with build-time and run-time flavors, a SamplerBase abstraction with uniform and categorical samplers, and two concrete variations -- HDR dome-light image selection (build-time) and camera extrinsic decalibration (run-time). Variations attach to any Asset (scene objects or embodiments) and are collected by ArenaEnvBuilder, which applies build-time variations before scene composition and folds run-time variations into the event manager cfg. All variations default to disabled so existing envs are unchanged. The variations package exposes its API lazily (PEP 562 __getattr__): camera_decalibration pulls in torch and isaaclab.sensors, and importing that pair before the SimulationApp launches corrupts USD's Python bindings. Lazy exports keep importing the package -- or its lightweight base/sampler submodules -- safe at module-load time (e.g. pytest collection). Signed-off-by: alex <amillane@nvidia.com>
- Rename camera_decalibration.py -> camera_decalibration_variation.py - Drop package-level re-exports; import concrete classes from submodules - Remove sampler/variation listener plumbing (deferred to a follow-up MR) - Hoist a compulsory sampler_cfg field onto VariationBaseCfg and rename the per-variation sampler field to sampler_cfg - Replace UniformSampler.event_shape with an abstract SamplerBase.shape_per_sample - Drop the camera variation's unused mode field (all variations fire on reset) - Move attribute docs onto each member and trim docstrings to one line Signed-off-by: alex <amillane@nvidia.com>
Agents drove the container with bare `docker exec` (root), while users attach via run_docker.sh as their host user (su $(id -un)). Root-run commands left root-owned files in shared caches (/tmp/Assets, ~/.config) that the host user could not read, breaking their later interactive runs. Update AGENTS.md and the dev-container/run-tests skills to route every in-container command through `su $(id -un) -c`, mirroring run_docker.sh. Signed-off-by: alex <amillane@nvidia.com>
…ariations_in_eval_runner
…re/variations_in_eval_runner
alexmillane
left a comment
There was a problem hiding this comment.
Self review.
| Mirrors policy_runner's ``--list-variations`` but, because each eval job defines its own | ||
| environment via ``arena_env_args``, the catalogue is printed per job. Each job's own | ||
| ``variations`` overrides are applied so the catalogue reflects what the run would use. | ||
| Must be called inside a ``SimulationAppContext``. |
There was a problem hiding this comment.
Remove long docstring
| registered set. The ``sampler_cfg.low`` / ``sampler_cfg.high`` vectors widen the camera | ||
| extrinsics jitter range to ±10 mm per axis. | ||
|
|
||
| Configuring variations in an eval jobs config |
There was a problem hiding this comment.
Move this section below the one below, to the end of the file.
| with open(args_cli.eval_jobs_config, encoding="utf-8") as f: | ||
| eval_jobs_config = json.load(f) | ||
|
|
||
| # Print the variations catalogue for each job's environment and exit, without running anything. |
There was a problem hiding this comment.
shorten to "Print the variations catalogue for each job's environment and exit."
There was a problem hiding this comment.
Delete this file. This was deleted on main and is being accidentally readded here.
| """Drive the eval-config path (Job.from_dict -> load_env) and check the override reached a manager. | ||
|
|
||
| A runtime variation enabled via the job's ``variations`` block must surface as an | ||
| ``EventTermCfg`` in the composed ``env_cfg.events`` manager. We use the droid wrist-camera | ||
| extrinsics variation, which produces the event term ``wrist_camera_extrinsics_variation``. | ||
| """ |
There was a problem hiding this comment.
Shorten to: "Enable a wrist camera extrinsics variation and check that it shows up as an event term in the cfg."
| "num_steps": NUM_STEPS, | ||
| "policy_type": "zero_action", | ||
| "policy_config_dict": {}, | ||
| "variations": {"droid_abs_joint_pos": {f"camera_extrinsics_{camera_name}": {"enabled": True}}}, |
There was a problem hiding this comment.
Put a comment before this line "Enabling wrist camera extrinsics variation."
| def test_job_from_dict_with_variations(): | ||
| """Job.from_dict populates job.variations from a nested variations block.""" | ||
| job_dict = { | ||
| "name": "job_with_variations", | ||
| "arena_env_args": {"environment": "env1"}, | ||
| "policy_type": "zero_action", | ||
| "policy_config_dict": {}, | ||
| "variations": {"light": {"hdr_image": {"enabled": True}}}, | ||
| } | ||
|
|
||
| job = Job.from_dict(job_dict) | ||
|
|
||
| assert job.variations == ["light.hdr_image.enabled=true"] |
There was a problem hiding this comment.
Combine this test with the test test_job_convert_variations_dict_to_hydra_overrides above.
| def test_job_from_dict_without_variations_defaults_empty(): | ||
| """A job without a variations field defaults to no overrides.""" | ||
| job_dict = { | ||
| "name": "job_without_variations", | ||
| "arena_env_args": {"environment": "env1"}, | ||
| "policy_type": "zero_action", | ||
| "policy_config_dict": {}, | ||
| } | ||
|
|
||
| job = Job.from_dict(job_dict) | ||
|
|
||
| assert job.variations == [] |
There was a problem hiding this comment.
Combine this test with test_job_from_dict_with_variations above.
There was a problem hiding this comment.
Delete this file. Added by accident.
Greptile SummaryThis PR wires the
Confidence Score: 4/5Safe to merge after addressing the null-variations crash in job_manager.py. The one concrete defect is in isaaclab_arena/evaluation/job_manager.py — the Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant eval_runner as eval_runner.py
participant JobManager
participant Job
participant ArenaEnvBuilder
User->>eval_runner: --eval_jobs_config config.json
eval_runner->>eval_runner: json.load(config)
eval_runner->>JobManager: JobManager(eval_jobs_config["jobs"])
JobManager->>Job: Job.from_dict(job_dict)
Job->>Job: "convert_variations_dict_to_hydra_overrides({"light": {"hdr_image": {"enabled": true}}})"
Job-->>JobManager: "job.variations = ["light.hdr_image.enabled=true"]"
JobManager-->>eval_runner: job_manager
alt --list-variations flag
eval_runner->>ArenaEnvBuilder: "get_arena_builder_from_cli(hydra_overrides=job.variations)"
ArenaEnvBuilder-->>eval_runner: catalogue string
eval_runner->>User: print catalogue and return
else normal run
loop for each job rebuild
eval_runner->>eval_runner: "load_env(job.arena_env_args, job.name, variations=job.variations)"
eval_runner->>ArenaEnvBuilder: "get_arena_builder_from_cli(hydra_overrides=job.variations)"
ArenaEnvBuilder->>ArenaEnvBuilder: apply Hydra overrides to env cfg
ArenaEnvBuilder-->>eval_runner: env
eval_runner->>eval_runner: rollout_policy(env, policy)
end
end
Reviews (2): Last reviewed commit: "Revert the submodule." | Re-trigger Greptile |
| policy_config_dict=data["policy_config_dict"], | ||
| status=status, | ||
| language_instruction=data.get("language_instruction"), | ||
| variations=cls.convert_variations_dict_to_hydra_overrides(data.get("variations", {})), |
There was a problem hiding this comment.
"variations": null in JSON crashes with a misleading error
data.get("variations", {}) returns the default {} only when the key is absent. If the JSON contains "variations": null (explicit JSON null), it returns None. Passing None into convert_variations_dict_to_hydra_overrides pushes ("", None) onto the stack; when that item is popped, isinstance(None, dict) is False and the code falls through to assert prefix != "", which fires with AssertionError: "Variation override path must be non-empty" — a message that gives no hint that null is the actual problem. Replace with data.get("variations") or {} to treat both absent and null as "no overrides".
| if isinstance(node, dict): | ||
| for key, value in node.items(): | ||
| assert key != "", "Variation override keys must be non-empty" | ||
| child_prefix = f"{prefix}.{key}" if prefix else str(key) | ||
| stack.append((child_prefix, value)) |
There was a problem hiding this comment.
The stack-based DFS pops in LIFO order, so sibling keys are emitted in reverse insertion order. While Hydra override semantics are generally order-independent, reversing the author's dict order is surprising. Switching to
reversed(list(node.items())) before appending preserves input order with no algorithmic cost.
| if isinstance(node, dict): | |
| for key, value in node.items(): | |
| assert key != "", "Variation override keys must be non-empty" | |
| child_prefix = f"{prefix}.{key}" if prefix else str(key) | |
| stack.append((child_prefix, value)) | |
| if isinstance(node, dict): | |
| for key, value in reversed(list(node.items())): | |
| assert key != "", "Variation override keys must be non-empty" | |
| child_prefix = f"{prefix}.{key}" if prefix else str(key) | |
| stack.append((child_prefix, value)) |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| assert key != "", "Variation override keys must be non-empty" | ||
| child_prefix = f"{prefix}.{key}" if prefix else str(key) | ||
| stack.append((child_prefix, value)) | ||
| continue | ||
| assert prefix != "", "Variation override path must be non-empty" |
There was a problem hiding this comment.
assert for runtime input validation
Python strips assert statements when the interpreter runs with the -O (optimize) flag. Both assert key != "" and assert prefix != "" are meant to guard against bad user input, but they become no-ops under optimization. Prefer raising ValueError with a descriptive message so the check is always enforced regardless of how the process is launched.
| "num_rebuilds": 5, | ||
| "policy_type": "zero_action", | ||
| "policy_config_dict": {}, | ||
| "variations": { |
There was a problem hiding this comment.
Is this submodule bump intentional?
There was a problem hiding this comment.
Not at all. Good catch! That would have reintroduced the texture bug no doubt.
cvolkcvolk
left a comment
There was a problem hiding this comment.
Nice! lgtm. Keep an eye on the submodule bump.
| policy_config_dict=data["policy_config_dict"], | ||
| status=status, | ||
| language_instruction=data.get("language_instruction"), | ||
| variations=cls.convert_variations_dict_to_hydra_overrides(data.get("variations", {})), |
There was a problem hiding this comment.
data.get("variations", {}) only returns the {} default when the key is absent. If the JSON contains "variations": null (explicit JSON null), .get returns None, and convert_variations_dict_to_hydra_overrides(None) pushes ("", None) onto the stack. When that item is popped, isinstance(None, dict) is False and the code hits assert prefix != "" — raising AssertionError: "Variation override path must be non-empty", with no hint that null is the root cause. Using or {} treats both absent and explicit-null as "no overrides".
| variations=cls.convert_variations_dict_to_hydra_overrides(data.get("variations", {})), | |
| variations=cls.convert_variations_dict_to_hydra_overrides(data.get("variations") or {}), |
Summary
Connects the
eval_runner.pyto the variations.Detailed description
Job/jsonto add a field for variations.ArenaEnvBuilderfor building.droid_pnp_variations_config.jsonthat demonstrates how to use this feature and added it to the (placeholder) docs.