Skip to content

[Feat] support AutoPipelineForText2Audio#13511

Open
RuixiangMa wants to merge 4 commits into
huggingface:mainfrom
RuixiangMa:AutoPipelineForText2Audio
Open

[Feat] support AutoPipelineForText2Audio#13511
RuixiangMa wants to merge 4 commits into
huggingface:mainfrom
RuixiangMa:AutoPipelineForText2Audio

Conversation

@RuixiangMa
Copy link
Copy Markdown
Contributor

What does this PR do?

support AutoPipelineForText2Audio for ext2Audio model


import torch
import soundfile as sf
from diffusers import AutoPipelineForText2Audio

pipeline = AutoPipelineForText2Audio.from_pretrained(
    "stabilityai/stable-audio-open-1.0",
    torch_dtype=torch.float16
)
pipeline = pipeline.to("cuda")

output = pipeline(
    "Generate a male voice reading a paragraph",
    num_inference_steps=20,
    audio_end_in_s=10.0,
)

audio = output.audios[0].T.float().cpu().numpy()
sf.write("audio.wav", audio, pipeline.vae.sampling_rate)

audio.wav

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@github-actions github-actions Bot added documentation Improvements or additions to documentation utils pipelines size/L PR with diff > 200 LOC labels Apr 21, 2026
Signed-off-by: Lancer <maruixiang6688@gmail.com>
@RuixiangMa RuixiangMa force-pushed the AutoPipelineForText2Audio branch from 7a7d032 to 4ac3e90 Compare April 21, 2026 01:53
@github-actions github-actions Bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 21, 2026
@dg845 dg845 requested review from dg845 and sayakpaul May 29, 2026 03:30
Copy link
Copy Markdown
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think it'd be nice to also have an auto class for T2V ones.

```

There are three types of [AutoPipeline](../api/models/auto_model) classes, [`AutoPipelineForText2Image`], [`AutoPipelineForImage2Image`] and [`AutoPipelineForInpainting`]. Each of these classes have a predefined mapping, linking a pipeline to their task-specific subclass.
There are four types of [AutoPipeline](../api/models/auto_model) classes, [`AutoPipelineForText2Image`], [`AutoPipelineForImage2Image`], [`AutoPipelineForInpainting`] and [`AutoPipelineForText2Audio`]. Each of these classes have a predefined mapping, linking a pipeline to their task-specific subclass.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit): We could bullet them at this point, I think.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, updated this section to a bullet list for readability

Signed-off-by: Lancer <maruixiang6688@gmail.com>
@RuixiangMa
Copy link
Copy Markdown
Contributor Author

Thanks! I think it'd be nice to also have an auto class for T2V ones.

Thanks for the suggestion. I’ll add the video auto classes, including T2V/I2V, in a follow-up PR, and keep this one scoped to text-to-audio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation pipelines size/L PR with diff > 200 LOC utils

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants