Skip to content

Record and emit scaledDotProductAttention for IREE (#543)#544

Merged
michalharakal merged 1 commit intodevelopfrom
feature/sdpa-tape-recording-and-hlo
Apr 23, 2026
Merged

Record and emit scaledDotProductAttention for IREE (#543)#544
michalharakal merged 1 commit intodevelopfrom
feature/sdpa-tape-recording-and-hlo

Conversation

@michalharakal
Copy link
Copy Markdown
Contributor

Three changes to enable SDPA in the SKaiNET → StableHLO → IREE path:

  1. RecordingExecution: record SDPA calls with operation + params (was delegating without recording, like conv1d before PR RecordingTensorOpsDecorator drops conv1d/conv3d traces silently #532)

  2. TensorOperations: add ScaledDotProductAttentionOperation with inferOutputs (output shape = query shape)

  3. NeuralNetOperationsConverter: decompose SDPA into StableHLO:

    • dot_general Q @ K.T (batching_dims=[0,1], contracting_dims=[3]x[3])
    • scale + optional mask
    • softmax (max-subtract-exp-sum-div decomposition)
    • dot_general weights @ V (contracting_dims=[3]x[2])

Also includes:

  • SdpaHloExportTest: verifies tape → graph → MLIR with dot_general
  • TapeAttentionPermuteBugTest: proves raw array permute creates zero constants
  • ShapeOperationsConverter: concatenate input type annotation fix
  • ISSUE-SDPA-recording-and-hlo.md: issue documentation

Fixes #543

Three changes to enable SDPA in the SKaiNET → StableHLO → IREE path:

1. RecordingExecution: record SDPA calls with operation + params
   (was delegating without recording, like conv1d before PR #532)

2. TensorOperations: add ScaledDotProductAttentionOperation with
   inferOutputs (output shape = query shape)

3. NeuralNetOperationsConverter: decompose SDPA into StableHLO:
   - dot_general Q @ K.T (batching_dims=[0,1], contracting_dims=[3]x[3])
   - scale + optional mask
   - softmax (max-subtract-exp-sum-div decomposition)
   - dot_general weights @ V (contracting_dims=[3]x[2])

Also includes:
- SdpaHloExportTest: verifies tape → graph → MLIR with dot_general
- TapeAttentionPermuteBugTest: proves raw array permute creates zero constants
- ShapeOperationsConverter: concatenate input type annotation fix
- ISSUE-SDPA-recording-and-hlo.md: issue documentation

Fixes #543

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@michalharakal michalharakal merged commit 4c8283a into develop Apr 23, 2026
9 checks passed
@michalharakal michalharakal deleted the feature/sdpa-tape-recording-and-hlo branch April 23, 2026 05:27
@github-actions
Copy link
Copy Markdown

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

  • Operator documentation: docs/modules/operators/_generated_/
  • JSON schema output: operators.json

Artifacts:

  • Download the documentation-preview-544 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix scaledDotProductAttention in recording

1 participant