Add ernie image by HsiaWinter · Pull Request #13432 · huggingface/diffusers

HsiaWinter · 2026-04-08T04:14:03Z

What does this PR do?

We have introduced a new text-to-image model called ERNIE-Image, which will soon be open-sourced to the community. This PR includes the model architecture definition, the pipeline, as well as the related documentation and test files.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[✅] Did you read the contributor guideline?
[✅] Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
[✅] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

yiyixuxu

thanks for the PR!
i left some feedbacks

src/diffusers/models/transformers/transformer_ernie_image.py

src/diffusers/pipelines/ernie_image/pipeline_ernie_image.py

yiyixuxu

thanks!
i left a few more comments

src/diffusers/models/transformers/transformer_ernie_image.py

src/diffusers/pipelines/ernie_image/pipeline_ernie_image.py

yiyixuxu

thanks! left two small comments
let's merge this soon

src/diffusers/models/transformers/transformer_ernie_image.py

tests/models/transformers/test_models_transformer_ernie_image.py

yiyixuxu · 2026-04-09T18:18:31Z

@claude can you do a review here also? please keep these 3 note in mind as well during your review

compare the Ernie model/pipeline to others like Qwen/Flux —let us know if there is any significant inconsistencies you found.
if you see any unused code paths, let us know
Look over the PR comments I made and check if the same patterns we caught/fixed still exist elsewhere in the code.

github-actions · 2026-04-09T18:18:48Z

Claude Code is working…

I'll analyze this and get back to you.

View job run

yiyixuxu · 2026-04-10T05:26:17Z

@bot /style

github-actions · 2026-04-10T05:26:40Z

Style fix is beginning .... View the workflow run here.

yiyixuxu

thanks!

yiyixuxu · 2026-04-10T05:30:12Z

can you add the new doc pages to https://github.com/huggingface/diffusers/actions/runs/24222913924/job/70733036127?pr=13432#step:16:80

and also run make fix-copies?

dg845 · 2026-04-10T05:48:58Z

src/diffusers/models/transformers/transformer_ernie_image.py

+from ...utils import BaseOutput
+from ..normalization import RMSNorm
+from ..attention_processor import Attention
+from ..attention_dispatch import dispatch_attention_fn
+from ..attention import AttentionMixin, AttentionModuleMixin


Suggested change

from ...utils import BaseOutput

from ..normalization import RMSNorm

from ..attention_processor import Attention

from ..attention_dispatch import dispatch_attention_fn

from ..attention import AttentionMixin, AttentionModuleMixin

from ...utils import BaseOutput, logging

from ..normalization import RMSNorm

from ..attention_processor import Attention

from ..attention_dispatch import dispatch_attention_fn

from ..attention import AttentionMixin, AttentionModuleMixin

logger = logging.get_logger(__name__) # pylint: disable=invalid-name

logger is used in line 216 below:

https://github.com/HsiaWinter/diffusers/blob/c482b0d953ef8704bde3319d723c900619fb1fe5/src/diffusers/models/transformers/transformer_ernie_image.py#L216-L218

but is not currently defined.

yiyixuxu · 2026-04-10T05:49:34Z

tests/models/transformers/test_models_transformer_ernie_image.py

+    torch.backends.cuda.matmul.allow_tf32 = False
+
+
+class ErnieImageTransformerTests(ModelTesterMixin, unittest.TestCase):


actually can you write test in the new format, using BaseModelTesterConfig,
see this PR as reference https://github.com/huggingface/diffusers/pull/13344/changes

dg845 · 2026-04-10T06:50:40Z

docs/source/en/api/pipelines/ernie_image.md

+    width=1024,
+    num_inference_steps=50,
+    guidance_scale=5.0,
+    generator=generator,


Suggested change

generator=generator,

generator=torch.Generator("cuda").manual_seed(42),

generator is used here but not defined in the example.

dg845 · 2026-04-10T06:51:42Z

docs/source/en/api/pipelines/ernie_image.md

+    width=1024,
+    num_inference_steps=8,
+    guidance_scale=5.0,
+    generator=generator,


Suggested change

generator=generator,

generator=torch.Generator("cuda").manual_seed(42),

Same suggestion as in #13432 (comment).

dg845 · 2026-04-10T06:53:59Z

docs/source/en/api/pipelines/ernie_image.md

+
+pipe = ErnieImagePipeline.from_pretrained("baidu/ERNIE-Image", torch_dtype=torch.bfloat16)
+pipe.to("cuda")
+# 如果显存不足，可以开启offload


Suggested change

# 如果显存不足，可以开启offload

# If you are running low on GPU VRAM, you can enable offloading

nit: use English translation of comment since this file is in the en docs

dg845 · 2026-04-10T06:54:55Z

docs/source/en/api/pipelines/ernie_image.md

+
+pipe = ErnieImagePipeline.from_pretrained("baidu/ERNIE-Image-Turbo", torch_dtype=torch.bfloat16)
+pipe.to("cuda")
+# 如果显存不足，可以开启offload


Suggested change

# 如果显存不足，可以开启offload

# If you are running low on GPU VRAM, you can enable offloading

Same as #13432 (comment).

dg845 · 2026-04-10T07:39:48Z

src/diffusers/models/transformers/transformer_ernie_image.py

+        self.adaLN_mlp_ln = RMSNorm(hidden_size, eps=eps)
+        self.mlp = ErnieImageFeedForward(hidden_size, ffn_hidden_size)
+
+    def forward(self, x, rotary_pos_emb, shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp, attention_mask=None):


Suggested change

def forward(self, x, rotary_pos_emb, shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp, attention_mask=None):

def forward(self, x, rotary_pos_emb, temb: tuple[torch.Tensor, ...], attention_mask: torch.Tensor | None = None):

shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = temb

nit: I think it would be a little cleaner if we put all the modulation parameters as a tuple in a temb argument and then unpacked it inside forward.

dg845 · 2026-04-10T07:42:03Z

src/diffusers/pipelines/ernie_image/pipeline_ernie_image.py

+            pe=pe,
+            pe_tokenizer=pe_tokenizer,
+        )
+        self.vae_scale_factor = 16  # VAE downsample factor


Suggested change

self.vae_scale_factor = 16 # VAE downsample factor

self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels)) if getattr(self, "vae", None) else 16 # VAE downsample factor

nit: I think it would be better to try to get the VAE scale factor from the VAE config if possible so that it's easier to use different VAEs if necessary (not sure if the suggestion is exactly right).

dg845 · 2026-04-10T07:44:12Z

src/diffusers/pipelines/ernie_image/pipeline_ernie_image.py

+        # Latent dimensions
+        latent_h = height // self.vae_scale_factor
+        latent_w = width // self.vae_scale_factor
+        latent_channels = 128  # After patchify


Suggested change

latent_channels = 128 # After patchify

latent_channels = self.transformer.config.in_channels # 128 after patchify

nit: get latent_channels from the transformer config so that the pipeline code is more robust to different transformers

dg845

Thanks for the PR! Left a few small comments.

HsiaWinter and others added 9 commits April 2, 2026 16:39

Add ERNIE-Image

4533474

Update doc

4049a20

Update doc

579e6c7

Change from Custom-Attention to Diffusers Style Attention

d16d16e

Change from Custom-Attention to Diffusers Style Attention

9cbbf5d

兼容SGLang

9fca912

优化PE模块的加载与offload策略

465f009

更新Doc文件与config配置相关内容

6afd534

Merge branch 'huggingface:main' into add-ernie-image

11ffcd9

github-actions bot added documentation Improvements or additions to documentation models tests utils pipelines size/L PR with diff > 200 LOC labels Apr 8, 2026

yiyixuxu reviewed Apr 8, 2026

View reviewed changes

yiyixuxu requested a review from dg845 April 8, 2026 09:02

Fix官方反馈的内容

b360596

github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 8, 2026

yiyixuxu reviewed Apr 8, 2026

View reviewed changes

根据官方建议优化代码

298322d

github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 9, 2026

yiyixuxu reviewed Apr 9, 2026

View reviewed changes

src/diffusers/models/transformers/transformer_ernie_image.py Show resolved Hide resolved

tests/models/transformers/test_models_transformer_ernie_image.py Outdated Show resolved Hide resolved

Update code

c482b0d

github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 10, 2026

yiyixuxu approved these changes Apr 10, 2026

View reviewed changes

dg845 reviewed Apr 10, 2026

View reviewed changes

yiyixuxu reviewed Apr 10, 2026

View reviewed changes

dg845 reviewed Apr 10, 2026

View reviewed changes

update

f8b1395

github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 10, 2026

		torch.backends.cuda.matmul.allow_tf32 = False


		class ErnieImageTransformerTests(ModelTesterMixin, unittest.TestCase):

	generator=generator,
	generator=torch.Generator("cuda").manual_seed(42),

	# 如果显存不足，可以开启offload
	# If you are running low on GPU VRAM, you can enable offloading

	def forward(self, x, rotary_pos_emb, shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp, attention_mask=None):
	def forward(self, x, rotary_pos_emb, temb: tuple[torch.Tensor, ...], attention_mask: torch.Tensor \| None = None):
	shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = temb

	self.vae_scale_factor = 16 # VAE downsample factor
	self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels)) if getattr(self, "vae", None) else 16 # VAE downsample factor

	latent_channels = 128 # After patchify
	latent_channels = self.transformer.config.in_channels # 128 after patchify

Conversation

HsiaWinter commented Apr 8, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yiyixuxu commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

yiyixuxu commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Apr 10, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dg845 Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dg845 Apr 10, 2026 •

edited

Loading