Add FLUX.2 Klein Inpaint Pipeline#13050
Add FLUX.2 Klein Inpaint Pipeline#13050adi776borate wants to merge 25 commits intohuggingface:mainfrom
Conversation
|
I have to be honest, I am not getting good results at all so far, especially when working with bounding box masks, even without a reference. At On the other hand, I don't have the same issues with the regular If you ask, I can provide examples of the results I get once my GPU frees up. |
There was a problem hiding this comment.
Pull request overview
This PR adds the Flux2KleinInpaintPipeline to enable image inpainting capabilities for the FLUX.2 Klein model. The pipeline supports both basic text-guided inpainting and experimental reference image conditioning, addressing issue #13005.
Changes:
- Implements a new inpainting pipeline for FLUX.2 Klein with masking support
- Adds optional reference image conditioning for more controlled inpainting
- Extends
Flux2ImageProcessorwithdo_binarizeanddo_convert_grayscaleparameters for mask processing
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/diffusers/pipelines/flux2/pipeline_flux2_klein_inpaint.py |
Main pipeline implementation with inpainting logic, mask handling, and reference image support |
tests/pipelines/flux2/test_pipeline_flux2_klein_inpaint.py |
Test suite covering basic inpainting functionality including different prompts, output shapes, and strength variations |
src/diffusers/pipelines/flux2/image_processor.py |
Enhanced image processor with binarization and grayscale conversion options |
src/diffusers/pipelines/flux2/__init__.py |
Export declarations for the new pipeline |
src/diffusers/pipelines/__init__.py |
Top-level export declarations |
src/diffusers/__init__.py |
Main package export declarations |
src/diffusers/utils/dummy_torch_and_transformers_objects.py |
Dummy object for missing dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I'd missed patchifying the mask latents earlier. With that corrected, here are some new example generations: Below all are using 9B model with
@Natans8 this might improve what you observed but I'll suggest wait till maintainers' review. |
|
oh I was waiting to see if the your changes fixed the issue that @Natans8 was having, I'll give it a test today review the PR |
It is still working a little weird to me. These are my results on the latest branch: pipe = Flux2KleinInpaintPipeline.from_pretrained("black-forest-labs/FLUX.2-klein-4B", torch_dtype=torch.bfloat16)
inpainted_img = pipe(
prompt="Remove the person, keep the background unchanged",
image=image,
mask_image=mask_pil, #rectangular mask
num_inference_steps=4,
guidance_scale=1.0,
generator=generator,
strength=1.0).images[0]And this image For comparison this is When trying to tweak the strength, or the prompt of the inpaint pipeline I don't get much better results, sometimes it's random trees or rooms insensitive to the contents of the image, sometimes it's a black box, a white box, a slice of a wall. Of course I have to compare how it fares with 9B as well, ideally. There is also the possibility that I'm just doing something incorrectly, I do not have a lot of experience with Diffusers. But i hope this feedback will be useful. |
|
For some reason, the demo examples for inpainting that people always use are very simple which will never fail, for example the dog sitting on the bench from the original SD, any inpainting where there's a clear separation from the subject and the background is very simple and any modern model can do it without any issues, you can even use generative fill from photoshop before the AI era and it will also be good. For what I've tested, Klein almost never changes the rest of the image if you are specific in the edit, the VAE is also very good which makes the quality loss minimal. The use case for this pipeline (IMO) is to be able to separate subjects when they're very similar and you can't just prompt it, in @Natans8 you can prompt to for a specific person but its enough of a good example because it's not a simple image to inpaint and we can prompt for people, also I wouldn't test this with the base model since it's not worth the 50 steps and CFG. I did myself a test pipeline with differential diffusion to see if the model is capable of doing a good edit with restrictions, I will use for all the test just the prompt
So the model is capable of doing it, so now to test this pipeline: With the default strength:
I'll use just the distilled one since the results are similar, so varying the strenght I got these:
With 0.9 you could see it's somewhat decent and it's on pair for what I've seen with other inpainting pipelines. I also tried with the prompt
@adi776borate does this match your results? if this looks usable to you, we can continue and review it. |
…ask spatial alignment and remove unused VAE encoding
…76borate/diffusers into feature/flux2-klein-inpaint
asomoza
left a comment
There was a problem hiding this comment.
left some more comments, don't forget to run:
make style
make quality
so we can run the tests.
| # broadcast to batch dimension in a way that's compatible with ONNX/Core ML | ||
| timestep = t.expand(latents.shape[0]).to(latents.dtype) | ||
|
|
||
| latent_model_input = torch.cat([latents, condition_image_latents], dim=1) |
There was a problem hiding this comment.
here we're concatenating the latents and the condition_image_latents, but when called with the methods, you're passing a dtype that can be different.
latents uses prompt_embeds.dtype while condition_image_latents uses self.vae.dtype, this means that if the user changes the vae dtype (for example to use a higher precision for encoding or decoding) this will fail.
This is probably something we can also fix in the other PR
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
|
@asomoza I have made the suggested changes and also modified the batching logic as this specific test was failing with earlier code : |
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
|
Hi! If everything appears to be in order, shall we proceed with the tests? |
asomoza
left a comment
There was a problem hiding this comment.
thanks, left a few comments, we're close to merge now
| # 2.2 Preprocess reference image | ||
| processed_image_reference = None | ||
| if image_reference is not None and not ( | ||
| isinstance(image_reference, torch.Tensor) and image_reference.size(1) == self.latent_channels |
There was a problem hiding this comment.
you skip the preprocessing but in 1017 you set the processed_image_reference as None, so if the user passes the image_reference as tensors it skips this and never set the processed_image_reference
There was a problem hiding this comment.
Added an else block to handle it
| else: | ||
| image_latents = torch.cat([image_latents], dim=0) |
There was a problem hiding this comment.
| else: | |
| image_latents = torch.cat([image_latents], dim=0) |
the image_latents are already the same
There was a problem hiding this comment.
Agreed, removing it
| else: | ||
| # multiple images per sample in the batch | ||
| item_ids = [x_ids] | ||
| for _ in range(1, b_i): | ||
| t_offset += scale | ||
| t = torch.tensor([t_offset]).view(-1) | ||
| item_ids.append( | ||
| torch.cartesian_prod(t, torch.arange(height), torch.arange(width), torch.arange(1)) | ||
| ) | ||
| x_ids = torch.cat(item_ids, dim=0) # (b_i * h * w, 4) | ||
| x_ids = x_ids.unsqueeze(0).expand(batch_size, -1, -1) | ||
| all_image_latent_ids.append(x_ids) | ||
| t_offset += scale |
There was a problem hiding this comment.
Yes it is reachable when we pass multiple reference images for a single sample.
| params = frozenset( | ||
| ["prompt", "image", "image_reference", "mask_image", "height", "width", "guidance_scale", "prompt_embeds"] | ||
| ) | ||
| batch_params = frozenset(["prompt", "image", "mask_image"]) |
There was a problem hiding this comment.
| batch_params = frozenset(["prompt", "image", "mask_image"]) | |
| batch_params = frozenset(["prompt", "image", "image_reference", "mask_image"]) |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@claude can you help to do a review here? |
|
I'll analyze this and get back to you. |
|
@bot /style |
|
Style bot fixed some files and pushed the changes. |
|
@adi776borate can you run |
I intentionally held off on that. Could you please review #13299 first? I discussed this with @asomoza here. |
|
@asomoza I have addressed your last comments and also removed some redundant dtype casts while I was at it. This should now correctly handle pre-encoded latents as source or ref image input. I think we should also update the docs to tell the user that, we expect patchified latents as input. |






































What does this PR do?
Fixes #13005
This PR adds the
Flux2KleinInpaintPipelinefor image inpainting using the FLUX.2 [Klein] model with optional reference image conditioning.Examples
Basic Inpainting
Inpainting with Reference Image
Known Limitations
Generation quality may vary - Some outputs may contain artifacts. This can often be mitigated with better prompts and tuning hyperparameters (strength, guidance_scale, num_inference_steps).
Reference image conditioning is experimental - Inpainting with
image_referencemay not consistently produce desired results.These limitations may stem either from a bug in the pipeline implementation by me or from inherent constraints of the model. Feedback is appreciated.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@asomoza @sayakpaul
Anyone in the community is free to review the PR once the tests have passed.