Fix webgpu native SDPA test: select attention output by position, not numel by shoumikhin · Pull Request #20282 · pytorch/executorch

shoumikhin · 2026-06-15T14:54:34Z

Summary

test-webgpu-native / linux-job (Test WebGPU Native (Dawn)) has been red on every commit since the WebGPU SDPA test suite landed on June 13. This is a test-harness bug, not a kernel bug.

The sdpa_with_kv_cache sweep selects the attention output among the mutating op's outputs by matching the element count (numel). For the llama1b_prefill config (Hq=32, Hkv=8, D=64, S=128, Cmax=512) that selection is ambiguous:

attention output numel = SHqD = 3212864 = 262144
k cache numel = v cache numel = CmaxHkvD = 851264 = 262144

Because S*Hq == Cmax*Hkv (both 4096), all three tensors share numel 262144, so three tensors match and the test fails with ambiguous attention output: 3 tensors match numel 262144. Every other shape passes (max error ~1e-8), so the kernel is correct.

Fix

The mutating op returns [k_cache, v_cache, attn_output] in a fixed order (already documented in the original comment), so the attention output is always the last output. Select the last output directly and keep a numel sanity check. Test-only change.

Test plan

test-webgpu-native / linux-job should go green; the llama1b_prefill config now passes its numeric check instead of failing output selection. All other SDPA configs are unaffected.

Note: the SDPA replay tests in the same file use a separate cache-detection scheme and are not affected by this change.

… numel The sdpa_with_kv_cache sweep picked the attention output among the mutating op's outputs by matching element count. For llama1b_prefill (Hq=32,Hkv=8,D=64,S=128,Cmax=512) the attention output (S*Hq*D=262144) shares its numel with both the k and v caches (Cmax*Hkv*D=262144, since S*Hq==Cmax*Hkv), so three tensors matched and the test failed with 'ambiguous attention output'. The op returns [k_cache, v_cache, attn_output] in a fixed order, so select the last output instead and keep a numel sanity check. Test-only; the kernel is correct.

pytorch-bot · 2026-06-15T14:54:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20282

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 79 Cancelled Jobs, 74 Pending

As of commit 87ab706 with merge base 06143cb ():

CANCELLED JOBS - The following jobs were cancelled. Please retry:

Build Presets / apple (ios) / build (gh)
Build Presets / apple (macos) / build (gh)
Build Presets / apple (profiling) / build (gh)
##[error]The operation was canceled.
Build Presets / linux (linux, linux.2xlarge, executorch-ubuntu-22.04-clang12) / build (gh)
##[error]The operation was canceled.
Build Presets / linux (llm, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11-aarch64) / build (gh)
##[error]The operation was canceled.
Build Presets / linux (pybind, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11-aarch64) / build (gh)
Build Presets / windows (windows) / build (gh)
Cadence Build & Test / cpu-build / build (gh)
Cadence Build & Test / vision-build / vision (gh)
pull / android / run-emulator (gh)
pull / test-arm-backend-no-driver (test_pytest_ops_no_target) / linux-job (gh)
pull / test-arm-backend-no-driver (test_pytest_ops_tosa) / linux-job (gh)
pull / test-arm-cortex-m-size-test (zephyr-preset) / linux-job (gh)
##[error]The operation was canceled.
pull / test-build-wasm-linux / linux-job (gh)
pull / test-eval_llama-wikitext-linux / linux-job (gh)
##[error]The operation was canceled.
pull / test-llama_runner_eager-linux / linux-job (gh)
pull / test-llama-runner-linux (fp32, xnnpack+custom+qe, linux.arm64.2xlarge, executorch-ubuntu-22.04-gc... / linux-job (gh)
##[error]The operation was canceled.
pull / test-llama-runner-linux (fp32, xnnpack+custom+quantize_kv, linux.2xlarge, executorch-ubuntu-22.04... / linux-job (gh)
pull / test-llama-runner-linux (fp32, xnnpack+custom+quantize_kv, linux.arm64.2xlarge, executorch-ubuntu... / linux-job (gh)
pull / test-llama-runner-linux (fp32, xnnpack+quantize_kv, linux.2xlarge, executorch-ubuntu-22.04-clang12) / linux-job (gh)
pull / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh)
pull / test-lora-linux / linux-job (gh)
##[error]The operation was canceled.
pull / test-lora-multimethod-linux / linux-job (gh)
pull / test-models-linux (add_mul, portable, linux.2xlarge) / linux-job (gh)
##[error]The operation was canceled.
pull / test-models-linux (add_mul, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
##[error]The operation was canceled.
pull / test-models-linux (emformer_transcribe, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
pull / test-models-linux (ic3, portable, linux.2xlarge) / linux-job (gh)
pull / test-models-linux (ic3, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
pull / test-models-linux (ic4, xnnpack-quantization-delegation, linux.4xlarge.memory) / linux-job (gh)
pull / test-models-linux (linear, portable, linux.2xlarge) / linux-job (gh)
##[error]The operation was canceled.
pull / test-models-linux (mobilebert, portable, linux.2xlarge) / linux-job (gh)
pull / test-models-linux (mobilebert, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
pull / test-models-linux (mv2, portable, linux.2xlarge) / linux-job (gh)
pull / test-models-linux (phi_4_mini, portable, linux.4xlarge.memory) / linux-job (gh)
##[error]The operation was canceled.
pull / test-models-linux (w2l, portable, linux.4xlarge.memory) / linux-job (gh)
pull / test-models-linux-basic (mv3, portable, cmake, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11... / linux-job (gh)
pull / test-models-linux-basic (mv3, xnnpack-quantization-delegation, buck2, linux.2xlarge, executorch-u... / linux-job (gh)
##[error]The operation was canceled.
pull / test-models-linux-basic (mv3, xnnpack-quantization-delegation, cmake, linux.arm64.2xlarge, execut... / linux-job (gh)
##[error]The operation was canceled.
pull / test-models-linux-basic (vit, portable, cmake, linux.2xlarge, executorch-ubuntu-22.04-clang12) / linux-job (gh)
pull / test-models-linux-basic (vit, portable, cmake, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11... / linux-job (gh)
##[error]The operation was canceled.
pull / test-models-linux-basic (vit, xnnpack-quantization-delegation, buck2, linux.2xlarge, executorch-u... / linux-job (gh)
pull / test-models-linux-basic (vit, xnnpack-quantization-delegation, cmake, linux.2xlarge, executorch-u... / linux-job (gh)
##[error]The operation was canceled.
pull / test-multimodal-linux (gemma3-4b) / linux-job (gh)
pull / test-openvino-linux / linux-job (gh)
pull / test-phi-3-mini-runner-linux / linux-job (gh)
pull / test-qnn-delegate-linux / linux-job (gh)
pull / test-qnn-models-linux (dl3) / linux-job (gh)
pull / test-qnn-models-linux (mv3) / linux-job (gh)
pull / test-qnn-passes-linux / linux-job (gh)
pull / test-qnn-python-imports-linux / linux-job (gh)
pull / test-qnn-testsuite-linux / package-golden-artifacts (gh)
pull / test-qnn-testsuite-linux / test-backend-linux (qnn, models) / linux-job (gh)
pull / test-qnn-testsuite-linux / test-backend-linux (qnn, operators) / linux-job (gh)
pull / test-qnn-wheel-packages-linux (3.12) / linux-job (gh)
##[error]The operation was canceled.
pull / test-qnn-wheel-packages-linux (3.13) / linux-job (gh)
##[error]The operation was canceled.
pull / test-quantized-aot-lib-linux / linux-job (gh)
pull / test-samsung-models-linux / linux-job (gh)
pull / test-voxtral-realtime-xnnpack-linux / linux-job (gh)
pull / test-vulkan-operators-linux / linux-job (gh)
pull / unittest / linux / linux-job (gh)
##[error]The operation was canceled.
pull / unittest / macos / macos-job (gh)
pull / unittest-buck / linux / linux-job (gh)
pull / unittest-buck / macos / macos-job (gh)
pull / unittest-editable / linux / linux-job (gh)
pull / unittest-editable / macos / macos-job (gh)
pull / unittest-wasm-bindings (--enable-etdump) / linux-job (gh)
##[error]The operation was canceled.
Test ARM Backend / test-arm / package-golden-artifacts (gh)
Test ARM Backend / test-arm / test-backend-linux (arm_tosa_fp, models) / linux-job (gh)
##[error]The operation was canceled.
Test ARM Backend / test-arm / test-backend-linux (arm_vgf_fp, models) / linux-job (gh)
Test ARM Backend / test-arm / test-backend-linux (arm_vgf_fp, operators) / linux-job (gh)
Test QNN Backend / test-qnn / package-golden-artifacts (gh)
Test QNN Backend / test-qnn / test-backend-linux (qnn, operators) / linux-job (gh)
Test Vulkan Backend / test-vulkan / package-golden-artifacts (gh)
Test Vulkan Backend / test-vulkan / test-backend-linux (vulkan, models) / linux-job (gh)
Test Vulkan Backend / test-vulkan / test-backend-linux (vulkan, operators) / linux-job (gh)
##[error]The operation was canceled.
Test WebGPU Backend / test-webgpu / package-golden-artifacts (gh)
Test WebGPU Backend / test-webgpu / test-backend-linux (webgpu, models) / linux-job (gh)
##[error]The operation was canceled.
Test XNNPACK Backend / test-xnnpack / package-golden-artifacts (gh)
Test XNNPACK Backend / test-xnnpack / test-backend-linux (xnnpack, operators) / linux-job (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-15T14:55:34Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

Fixes the WebGPU native SDPA test harness by selecting the attention output by output position (fixed output order) rather than by element count, avoiding ambiguity when caches and attention output share the same numel (e.g. llama1b_prefill).

Changes:

Select SDPA attention output as the last output (outputs.back()) instead of searching by numel.
Add a numel sanity check on the selected output and update related failure messages.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…numel asserts Use the documented ExecuTorch output order [*mutated_inputs, *user_outputs] = [k_cache, v_cache, attn_output] and select the attention output by position (index 2) instead of by element count. Assert outputs.size()==3, that each output is a tensor, and that the two cache slots have numel Cmax*Hkv*D, so a future change in output arity/order fails loudly. Print numel as %zu/size_t to match the rest of the file and avoid truncation. Addresses Copilot review comments.

The replay and dynamic-input_pos decode tests had the same latent bug as the sweep: they classified outputs by element count (ne==qn attn, ne==cn cache), which is ambiguous when the attention output and caches share numel. Switch both to the documented positional order [k_cache, v_cache, attn_output] and assert 3 tensor outputs; the existing content-based k/v cache disambiguation and cross-step threading are unchanged.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

- Remove the leftover replay comment claiming the attn output has a unique numel (the whole point of this PR is that it can equal the cache numel). - In both replay and dynamic-decode, after positional selection, assert the two cache outputs have numel cn before step-0 identification/threading read cn/kvn elements from them, turning a short/changed tensor into a clean failure instead of an out-of-bounds read.

shoumikhin · 2026-06-15T17:29:23Z

Superseded by #20283, which fixes the same test-webgpu-native (Dawn) red by selecting the SDPA attention output by shape instead of element count. That landed on main first, so I'm closing this PR to avoid a redundant, conflicting change.

One thing #20283 does not cover: it only updates test_sdpa_config. The test_sdpa_replay and test_sdpa_dynamic_decode functions in the same file still select the attention output by numel and carry the same latent ambiguity (not triggered by any currently configured shape, since none has SHq == CmaxHkv). Happy to send a small follow-up that hardens those two to match the shape-based selection if that's useful.

Copilot AI review requested due to automatic review settings June 15, 2026 14:54

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 15, 2026

shoumikhin had a problem deploying to cadence June 15, 2026 14:54 — with GitHub Actions Error

Copilot started reviewing on behalf of shoumikhin June 15, 2026 14:55 View session

Copilot AI reviewed Jun 15, 2026

View reviewed changes

Comment thread backends/webgpu/test/test_webgpu_native.cpp Outdated

Comment thread backends/webgpu/test/test_webgpu_native.cpp

shoumikhin temporarily deployed to cadence June 15, 2026 15:07 — with GitHub Actions Inactive

Copilot AI review requested due to automatic review settings June 15, 2026 15:49

shoumikhin had a problem deploying to cadence June 15, 2026 15:49 — with GitHub Actions Error

Copilot started reviewing on behalf of shoumikhin June 15, 2026 15:50 View session

Copilot AI reviewed Jun 15, 2026

View reviewed changes

Comment thread backends/webgpu/test/test_webgpu_native.cpp Outdated

Comment thread backends/webgpu/test/test_webgpu_native.cpp

Comment thread backends/webgpu/test/test_webgpu_native.cpp

shoumikhin temporarily deployed to cadence June 15, 2026 15:56 — with GitHub Actions Inactive

shoumikhin closed this Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix webgpu native SDPA test: select attention output by position, not numel#20282

Fix webgpu native SDPA test: select attention output by position, not numel#20282
shoumikhin wants to merge 4 commits into
mainfrom
oncall-fix-webgpu-sdpa-output-selection

shoumikhin commented Jun 15, 2026

Uh oh!

pytorch-bot Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shoumikhin commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shoumikhin commented Jun 15, 2026

Summary

Fix

Test plan

Uh oh!

pytorch-bot Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20282

❌ 79 Cancelled Jobs, 74 Pending

Uh oh!

github-actions Bot commented Jun 15, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shoumikhin commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot Bot commented Jun 15, 2026 •

edited

Loading

This PR needs a `release notes:` label