Skip to content

[Feature] Introduce mujoco_playground environment wrapper#3751

Open
itwasabhi wants to merge 9 commits into
pytorch:mainfrom
itwasabhi:mujoco_playground0
Open

[Feature] Introduce mujoco_playground environment wrapper#3751
itwasabhi wants to merge 9 commits into
pytorch:mainfrom
itwasabhi:mujoco_playground0

Conversation

@itwasabhi

Copy link
Copy Markdown
Contributor

Introduces a torch-rl wrapper of the mujoco playground environment. Wrapper also supports multi-agent decomposition.

Motivation and Context

#3733

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.

@pytorch-bot

pytorch-bot Bot commented May 14, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3751

Note: Links to docs will display an error until the docs builds have been completed.

❌ 12 New Failures, 1 Cancelled Job, 2 Pending, 1 Unrelated Failure

As of commit b3c41ee with merge base 8f7575d (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 14, 2026
@github-actions

Copy link
Copy Markdown
Contributor

⚠️ PR Title Label Error

PR title must start with a label prefix in brackets (e.g., [BugFix]).

Current title: Introduce mujoco_playground environment wrapper.

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix Label Applied Example
[BugFix] BugFix [BugFix] Fix memory leak in collector
[Feature] Feature [Feature] Add new optimizer
[Doc] or [Docs] Documentation [Doc] Update installation guide
[Refactor] Refactoring [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Tests [Tests] Add unit tests for buffer
[Environment] or [Environments] Environments [Environments] Add Gymnasium support
[Data] Data [Data] Fix replay buffer sampling
[Performance] or [Perf] Performance [Performance] Optimize tensor ops
[BC-Breaking] bc breaking [BC-Breaking] Remove deprecated API
[Deprecation] Deprecation [Deprecation] Mark old function
[Quality] Quality [Quality] Fix typos and add codespell

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

@github-actions github-actions Bot added Documentation Improvements or additions to documentation CI Has to do with CI setup (e.g. wheels & builds, tests...) Environments Adds or modifies an environment wrapper Examples Trainers labels May 14, 2026
@itwasabhi itwasabhi changed the title Introduce mujoco_playground environment wrapper. [Feature] Introduce mujoco_playground environment wrapper. May 14, 2026
@github-actions github-actions Bot added the Feature New feature label May 14, 2026
@vmoens vmoens force-pushed the mujoco_playground0 branch from c7a92d5 to 15de4dd Compare May 19, 2026 10:21
@itwasabhi

Copy link
Copy Markdown
Contributor Author

Thanks for the fixes @vmoens

@itwasabhi itwasabhi force-pushed the mujoco_playground0 branch from 15de4dd to 916baf0 Compare May 21, 2026 21:26
@itwasabhi

Copy link
Copy Markdown
Contributor Author

failing mujoco_playground tests should now pass.

@itwasabhi

Copy link
Copy Markdown
Contributor Author

@vmoens any other requested changes? I believe the remaining test failures are unrelated.

itwasabhi and others added 4 commits June 4, 2026 10:07
Introduces a torch-rl wrapper of the mujoco playground environment.
Wrapper also supports multi-agent decomposition.
- CI: install `playground` (PyPI name) instead of import name
  `mujoco_playground`; the previous `pip install mujoco_playground` was the
  root cause of the current `unittests-mujoco-playground` CI failure.
- CI: drop the broken try/except JAX-init fallback in run_test.sh — it ran
  in a subprocess that exited immediately, so `JAX_PLATFORM_NAME` never
  reached pytest and any real "GPU not visible" error was hidden.
- Wrapper: docstrings no longer advertise a `state` field that the env
  never emits; the JAX state is intentionally kept on
  `self._current_state` rather than round-tripped through TensorDict (this
  is now documented as a `.. note::` block).
- Wrapper: freeze `MujocoPlaygroundAgentSpec` /
  `MujocoPlaygroundAgentMapping` dataclasses and deep-copy
  `KNOWN_MARL_MAPPINGS` entries on string lookup so users cannot mutate
  the module-level mapping by accident.
- Wrapper: emit a `UserWarning` when resolving a string against
  `KNOWN_MARL_MAPPINGS`, since those indices target Brax's observation
  layout and may not be semantically equivalent for mujoco_playground envs.
- Wrapper: document the policy contract for `homogenization_mode='max'`
  and `'concat'` (which action/obs entries are real vs padding/discarded).
- Wrapper: align `_MujocoPlaygroundMeta` num_workers handling with
  `_BraxMeta`; make `agent_mapping` and `config`/`config_overrides`
  keyword-only; accept `seed=None` in `_set_seed` (defaults to 0, matching
  the `_reset` fallback) instead of raising bare `Exception`; document
  `_listerize`'s inclusive-range semantics; drop unused
  `pixels_only`/`camera_id`/`render_kwargs` from `_build_env`.
- Example: rewrite `save_visualization` in
  `profile_mujoco_playground_collector.py` to snapshot
  `env._current_state` during a manual rollout, removing dependencies on
  non-existent `env._state_example` and `td["state"]`.
- Config: drop stale `categorical_action_encoding` field on
  `MujocoPlaygroundEnvConfig`; add `agent_mapping` and `num_workers` so
  the MARL and parallel-process knobs are reachable from the config
  system.
- Tests: replace three duplicated `_setup_jax` fixtures with a single
  module-level autouse fixture; introduce a session-scoped
  `marl_env_sizes` fixture to avoid constructing a throwaway env in every
  MARL test; expect the new MABrax warning on string lookups; add a
  new `TestMujocoPlaygroundDictObs` class covering reset /
  `check_env_specs` and a negative test that
  `homogenization_mode != 'none'` raises `NotImplementedError` for
  dict-obs envs; make `test_no_mapping_regression` actually exercise the
  MARL env it constructs.
- Docs: keep the `MOGymEnv` / `MOGymWrapper` pair adjacent in
  `envs_libraries.rst`; `__repr__` now uses `env_name=` to match the
  constructor kwarg.
@itwasabhi itwasabhi force-pushed the mujoco_playground0 branch from 916baf0 to cc6c461 Compare June 4, 2026 09:10
@itwasabhi

Copy link
Copy Markdown
Contributor Author

@vmoens OK with these changes?

@vmoens vmoens changed the title [Feature] Introduce mujoco_playground environment wrapper. [Feature] Introduce mujoco_playground environment wrapper Jun 8, 2026
vmoens and others added 2 commits June 8, 2026 10:27
…e_skip

Several env wrappers (Brax, Jumanji, MuJoCo Playground, ...) accept a
``frame_skip`` argument but silently ignore it because they don't loop
over it in their own ``_step``. Capture this once in the ``EnvBase``
metaclass: when ``frame_skip > 1`` and the env does not implement it
natively, wrap the env in a ``FrameSkipTransform``.

A new ``EnvBase._has_frame_skip`` class attribute (default ``False``)
gates the behaviour. Envs that implement frame-skipping in their own
``_step`` opt out by setting it to ``True``:

- ``GymLikeEnv`` (loops over ``wrapper_frame_skip``) -> True, which also
  covers Gym, DMControl, RoboHive, Habitat, IsaacGym, SafetyGymnasium.
- ``GenesisWrapper`` (loops over ``_frame_skip``) -> True.
- ``JumanjiWrapper`` overrides ``_step`` without the loop, so it resets
  the inherited flag back to False and now gets the auto-transform.

The default construction path is unchanged (the block only triggers for
``frame_skip > 1``), so there is no impact on existing envs. Building the
``TransformedEnv`` does not recurse: ``TransformedEnv`` is created via
``_TEnvPostInit`` which bypasses ``_EnvPostInit.__call__``.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Cache the registry env listing (`available_envs`) once per process
  instead of re-enumerating the suites on every classproperty access.
- Precompute per-agent obs/action dims and index tensors in
  `_make_marl_specs` so the hot-path `_split_obs_for_agents` /
  `_reconstruct_global_action` no longer recompute `max_obs`/`raw_sizes`
  on every step. Behaviour is unchanged (identical shapes/values).
- Drop the redundant `wrapper_frame_skip = 1` line: `frame_skip > 1` is
  now handled by the auto-appended `FrameSkipTransform`.
- Document the two limitations surfaced in review: partial resets are not
  supported for vmapped batches (the JAX state lives on the instance), and
  `terminated` is aliased to `done` (no separate truncation signal).
- Document the public `MujocoPlaygroundAgentMapping` /
  `MujocoPlaygroundAgentSpec` dataclasses in the reference docs.
- Add a `frame_skip` test class and the missing
  `if __name__ == "__main__"` pytest entrypoint.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added Environments/jumanji Triggers jumanji environment tests only Transforms labels Jun 8, 2026
vmoens and others added 3 commits June 8, 2026 10:29
…attr

The auto-FrameSkipTransform block in `_EnvPostInit.__call__` read
`getattr(instance, "frame_skip", 1)` on every env construction. On a
batched env (`ParallelEnv`/`SerialEnv`) `frame_skip` is not a real
attribute, so this fell through to `BatchedEnv.__getattr__`, which for a
lazy (non-started) env calls `self.start()` and raises `RuntimeError`
("Trying to access attributes of closed/non started environments").
`getattr`'s default only swallows `AttributeError`, so the RuntimeError
propagated and broke construction of every lazy batched env (observed
across brax, jumanji, mujoco_playground num_workers tests and any plain
`ParallelEnv(...)` in the suite), in addition to spuriously starting the
workers.

Read `frame_skip` straight from `instance.__dict__` instead: it is set
there by `_EnvWrapper.__init__`, and batched/native envs that don't set
it simply default to 1 and are skipped without ever touching
`__getattr__`. Adds a regression test constructing lazy `SerialEnv` /
`ParallelEnv` of `CountingEnv`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Has to do with CI setup (e.g. wheels & builds, tests...) CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Documentation Improvements or additions to documentation Environments/jumanji Triggers jumanji environment tests only Environments Adds or modifies an environment wrapper Examples Feature New feature Trainers Transforms tutorials/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants