Skip to content

fix(ernie-image): pass attn_mask=None when text is unpadded#13802

Open
Ace3Z wants to merge 1 commit into
huggingface:mainfrom
Ace3Z:fix/ernie-flash-attention
Open

fix(ernie-image): pass attn_mask=None when text is unpadded#13802
Ace3Z wants to merge 1 commit into
huggingface:mainfrom
Ace3Z:fix/ernie-flash-attention

Conversation

@Ace3Z
Copy link
Copy Markdown

@Ace3Z Ace3Z commented May 24, 2026

Fixes #13801.

pipeline.transformer.set_attention_backend("flash") on ErnieImagePipeline crashes with ValueError: attn_mask is not supported for flash-attn 2. The same code works on ZImagePipeline and Flux2KleinPipeline.

ErnieImageTransformer2DModel.forward always built a bool attention mask from text_lens, even when every sample had full-length text. In that case the mask is all True and flash-attn legitimately refuses it. Z-Image's _prepare_for_attention already takes the shortcut: skip the mask when all(seq == max_seqlen). This PR does the same.

The previous all-True mask was a no-op on sdpa, cudnn and native paths, so behavior on those backends is unchanged. I checked numerically: with the same seed, baseline and fix produce bit-identical output in both the uniform and padded cases.

Before submitting

  • This PR fixes a typo or improves the docs.
  • Read the contributor guideline.
  • Read the philosophy doc.
  • Discussed via GitHub issue (Incompatibility between FlashAttention and ERNIE Image #13801).
  • Documentation: mask construction is internal to the forward and not referenced in docs/source/en/.
  • New tests: none. Happy to add a small regression test if reviewers want one.

cc @yiyixuxu @sayakpaul

@Ace3Z Ace3Z force-pushed the fix/ernie-flash-attention branch from 4bf7bcc to 5f37368 Compare May 25, 2026 08:57
@github-actions github-actions Bot added size/M PR with diff < 200 LOC size/S PR with diff < 50 LOC and removed size/M PR with diff < 200 LOC tests labels May 25, 2026
@Ace3Z Ace3Z force-pushed the fix/ernie-flash-attention branch from 5f37368 to fb49e96 Compare May 25, 2026 09:19
@github-actions github-actions Bot added tests size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels May 25, 2026
ErnieImageTransformer2DModel.forward built a bool attention mask from
text_lens on every call, including the common case where every sample
already has full-length text. flash-attn 2 rejects any non-None
attn_mask, so set_attention_backend('flash') crashed even though the
all-True mask was effectively a no-op. Z-Image's _prepare_for_attention
takes the same shortcut.

Closes huggingface#13801
@Ace3Z Ace3Z force-pushed the fix/ernie-flash-attention branch from fb49e96 to 84ae4b4 Compare May 25, 2026 10:27
@github-actions github-actions Bot removed the size/M PR with diff < 200 LOC label May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incompatibility between FlashAttention and ERNIE Image

1 participant