[Feature] Mask-aware BCLoss; LossModule._reduce_loss honors ("collect… by theap06 · Pull Request #3850 · pytorch/rl

theap06 · 2026-06-11T06:59:43Z

##Summary

Closes the loss-side gap from the sequence-RL composability work landed in #3695: SliceSampler(pad_output=True) writes ("collector", "mask") alongside the padded batch, but no loss in the repo was reading that mask. Padded positions were silently averaged into the gradient.

Extends LossModule._reduce_loss (torchrl/objectives/common.py) to look up ("collector", "mask") first, falling back to the legacy "shifted_valid" key so the existing PPO / A2C / Reinforce callers retain their behavior exactly.
Migrates BCLoss.forward() from _reduce(loss, reduction=self.reduction) to self._reduce_loss(loss, tensordict=tensordict) as the reference adoption. When the mask key is absent the output is byte-identical to the old path; when present, padded positions are excluded from the time-averaging.
Adds 5 mask-aware tests in test/objectives/test_bc.py:
- back-compat without mask key
- all-True mask is a no-op
- partial mask matches the loss computed on the real subset
- gradient through masked loss matches gradient through subset
- all-False mask reduces to 0 without NaN (denominator clamp)

…or", "mask") Closes the loss-side gap from the sequence-RL composability work landed in pytorch#3695: SliceSampler(pad_output=True) writes ("collector", "mask") alongside the padded batch, but no loss in the repo was reading that mask. Padded positions were silently averaged into the gradient. This PR: - Extends LossModule._reduce_loss (torchrl/objectives/common.py) to look up ("collector", "mask") first, falling back to the legacy "shifted_valid" key so the existing PPO / A2C / Reinforce callers retain their behavior exactly. - Migrates BCLoss.forward() from `_reduce(loss, reduction=self.reduction)` to `self._reduce_loss(loss, tensordict=tensordict)` as the reference adoption. When the mask key is absent the output is byte-identical to the old path; when present, padded positions are excluded from the time-averaging. - Adds 5 mask-aware tests in test/objectives/test_bc.py: * back-compat without mask key * all-True mask is a no-op * partial mask matches the loss computed on the real subset * gradient through masked loss matches gradient through subset * all-False mask reduces to 0 without NaN (denominator clamp) Follow-up PRs can adopt the same one-line change in PPO, A2C, IQL, CQL, SAC, DDPG, TD3, GAIL, DT, and Reinforce. Each is reviewable in minutes once this base pattern lands. The reduction infrastructure on _reduce(..., mask=, weights=) already existed; this PR just wires the input-side tensordict lookup into the common helper that losses already inherit.

pytorch-bot · 2026-06-11T06:59:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3850

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 11, 2026

github-actions Bot added Objectives Feature New feature labels Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Mask-aware BCLoss; LossModule._reduce_loss honors ("collect…#3850

[Feature] Mask-aware BCLoss; LossModule._reduce_loss honors ("collect…#3850
theap06 wants to merge 1 commit into
pytorch:mainfrom
theap06:feat/mask-aware-losses

theap06 commented Jun 11, 2026

Uh oh!

pytorch-bot Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

theap06 commented Jun 11, 2026

Uh oh!

pytorch-bot Bot commented Jun 11, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3850

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant