Skip to content

(UI): add latents preview estimation to anima denoise#9219

Open
dunkeroni wants to merge 3 commits into
invoke-ai:mainfrom
dunkeroni:anima_preview
Open

(UI): add latents preview estimation to anima denoise#9219
dunkeroni wants to merge 3 commits into
invoke-ai:mainfrom
dunkeroni:anima_preview

Conversation

@dunkeroni
Copy link
Copy Markdown
Collaborator

Summary

Small change to the intermediate latent being passed to the UI callback: Instead of always showing the current in-progress latents during denoise, use the current sigma value to estimate what the latents will look like at the final step. This lets the user see the preview earlier, and also allows them to see when certain features appear/transform in the denoising process.

QA Instructions

Run an anima txt2img and watch the image preview. Works with any sampler.

Previous example at 50%:
Screenshot From 2026-05-20 21-04-04

New example at 50%:
Screenshot From 2026-05-20 21-10-29

Merge Plan

Ready to merge

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

@github-actions github-actions Bot added python PRs that change python files invocations PRs that change invocations labels May 21, 2026
@kappacommit
Copy link
Copy Markdown
Contributor

works perfectly in my testing

@Pfannkuchensack Pfannkuchensack self-assigned this May 24, 2026
@Pfannkuchensack
Copy link
Copy Markdown
Collaborator

Findings

  • Medium: invokeai/app/invocations/anima_denoise.py:678-697 and invokeai/app/invocations/anima_denoise.py:625-654 regress the inpaint preview. In both the Euler and driver branches, latents_preview is computed before inpaint_extension.merge_intermediate_latents_with_init_latents(...) reinserts the init latents into latents. The PipelineIntermediateState is then sent with latents=latents_preview.squeeze(2), so the value shown to the user is the model's x0 estimate over the entire frame, including the regions that the inpaint extension is explicitly supposed to keep as the original image. Previously the callback received the post-merge latents, so masked-out regions matched the source image in the preview. Trigger scenario: any Anima inpaint workflow that supplies an inpaint mask/init image and watches the live preview. Evidence chain:

    1. _estimate_preview_latents is computed at line 678 (Euler) and line 625 (driver) using the pre-merge noise_pred/latents.
    2. The inpaint merge at lines 682-687 and lines 637-642 modifies only latents, not latents_preview.
    3. step_callback at lines 689-697 and 646-654 forwards latents_preview, never the merged state.
    4. invokeai/app/util/step_callback.py:242-245 renders whichever field is populated, so the user sees generated content over masked regions.

    To expose this issue, add a test that runs AnimaDenoiseInvocation with an inpaint_extension stub and asserts the tensor handed to step_callback reflects the merged init latents (or, equivalently, factor the preview-vs-merge ordering into a helper and unit-test that helper).

  • Low: invokeai/app/invocations/anima_denoise.py:646-654 and invokeai/app/invocations/anima_denoise.py:689-697 overload the wrong field on PipelineIntermediateState. invokeai/backend/stable_diffusion/extensions/preview.py:17-23 defines two distinct fields, latents (the current noisy sample) and predicted_original (the x0 estimate), and invokeai/app/util/step_callback.py:242-245 already prefers predicted_original when present. The new code stuffs the x0 estimate into latents= and leaves predicted_original as None. It "works" only because the consumer falls back to latents. Any future consumer that relies on the documented meaning of latents (e.g., debugging, exporting raw intermediate state, computing percent-noise) will receive an x0 estimate while believing it received the current noisy state. The fix is latents=latents.squeeze(2), predicted_original=latents_preview.squeeze(2).

    To expose this issue, add a unit test that asserts PipelineIntermediateState.latents produced by the Anima loop has variance consistent with the current sigma rather than near-zero (or directly asserts predicted_original is not None once the API is used as intended).

  • Low: invokeai/app/invocations/anima_denoise.py:709-715 introduces non-trivial math (preview = x_t - sigma * v) with no unit test. tests/app/invocations/test_anima_denoise.py only covers loglinear_timestep_shift; no test exercises _estimate_preview_latents. The function is pure, side-effect-free, and depends only on tensor arithmetic, so this is cheap coverage to add.

    To expose any regression in the formula (e.g., a future change accidentally flipping the sign or removing the fp32 promotion for bf16 inputs and triggering precision loss), add a test that calls AnimaDenoiseInvocation()._estimate_preview_latents(latents, sigma, noise_pred) with bf16 inputs and asserts both the dtype roundtrip and the closed-form result against a fp32 reference.

Open Questions

  • Heun/multi-step drivers: in the driver branch, latents_preview is recomputed on every sub-iteration but only emitted when it.completes_user_step is true. The emitted preview is therefore the x0 estimate from the last sub-iteration (e.g., the second-order corrector for Heun). I confirmed this is mathematically reasonable, but I did not verify against invokeai/backend/anima/ driver semantics that completes_user_step is guaranteed to be true on at least one iteration per emitted step and that the it.sigma_curr at that iteration is consistent with the latents being fed into the transformer. If a driver ever yields a completes_user_step=True iteration where the noise_pred was computed at a different sigma than it.sigma_curr (e.g., a corrector step that uses a midpoint sigma), the preview would systematically misestimate x0. Worth a spot-check of the Heun and any future driver implementations.

  • First-step preview at high sigma: at the very first user step, sigma_curr ~= 1.0 and latents is pure noise, so preview = latents - 1 * noise_pred is an unconstrained x0 estimate that may look noisy/garbage rather than empty. Not a defect (this is standard RF behavior), but worth confirming the UX is not worse than the previous "raw noisy latents" preview for the first 1-2 steps before celebrating the change as a UX improvement.

Waiting on Download for real testing.

@Pfannkuchensack
Copy link
Copy Markdown
Collaborator

The test shows the desired result.

@dunkeroni
Copy link
Copy Markdown
Collaborator Author

Per the AI review:

  • We are already not displaying corrected masked areas during inpainting on Anima. User will see what the mask areas are trending towards before correction for early steps, but they stabilize before the preview does anyway.
  • Yeah, can fix that. The only consumer is the UI preview, and the reason for both inputs is older models with samplers that supplied both.
  • Disagree that the math is non-trivial. It's one subtraction.
  • Substep early exits potentially disrupting the preview are not a concern.
  • First step is always noisy, also not a concern.

@Pfannkuchensack Pfannkuchensack removed their assignment May 25, 2026
@lstein lstein self-assigned this May 25, 2026
@lstein lstein added the 6.13.5 Library Updates label May 25, 2026
@lstein lstein moved this to 6.13.5 LIBRARY UPDATES in Invoke - Community Roadmap May 25, 2026
@lstein lstein added v6.13.x and removed 6.13.5 Library Updates labels May 25, 2026
@JPPhoto JPPhoto moved this from 6.13.5 LIBRARY UPDATES to 6.13.x Theme: MODELS in Invoke - Community Roadmap May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

invocations PRs that change invocations python PRs that change python files v6.13.x

Projects

Status: 6.13.x Theme: MODELS

Development

Successfully merging this pull request may close these issues.

5 participants