You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Small change to the intermediate latent being passed to the UI callback: Instead of always showing the current in-progress latents during denoise, use the current sigma value to estimate what the latents will look like at the final step. This lets the user see the preview earlier, and also allows them to see when certain features appear/transform in the denoising process.
QA Instructions
Run an anima txt2img and watch the image preview. Works with any sampler.
Previous example at 50%:
New example at 50%:
Merge Plan
Ready to merge
Checklist
The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)
Medium: invokeai/app/invocations/anima_denoise.py:678-697 and invokeai/app/invocations/anima_denoise.py:625-654 regress the inpaint preview. In both the Euler and driver branches, latents_preview is computed before inpaint_extension.merge_intermediate_latents_with_init_latents(...) reinserts the init latents into latents. The PipelineIntermediateState is then sent with latents=latents_preview.squeeze(2), so the value shown to the user is the model's x0 estimate over the entire frame, including the regions that the inpaint extension is explicitly supposed to keep as the original image. Previously the callback received the post-merge latents, so masked-out regions matched the source image in the preview. Trigger scenario: any Anima inpaint workflow that supplies an inpaint mask/init image and watches the live preview. Evidence chain:
_estimate_preview_latents is computed at line 678 (Euler) and line 625 (driver) using the pre-merge noise_pred/latents.
The inpaint merge at lines 682-687 and lines 637-642 modifies only latents, not latents_preview.
step_callback at lines 689-697 and 646-654 forwards latents_preview, never the merged state.
invokeai/app/util/step_callback.py:242-245 renders whichever field is populated, so the user sees generated content over masked regions.
To expose this issue, add a test that runs AnimaDenoiseInvocation with an inpaint_extension stub and asserts the tensor handed to step_callback reflects the merged init latents (or, equivalently, factor the preview-vs-merge ordering into a helper and unit-test that helper).
Low: invokeai/app/invocations/anima_denoise.py:646-654 and invokeai/app/invocations/anima_denoise.py:689-697 overload the wrong field on PipelineIntermediateState. invokeai/backend/stable_diffusion/extensions/preview.py:17-23 defines two distinct fields, latents (the current noisy sample) and predicted_original (the x0 estimate), and invokeai/app/util/step_callback.py:242-245 already prefers predicted_original when present. The new code stuffs the x0 estimate into latents= and leaves predicted_original as None. It "works" only because the consumer falls back to latents. Any future consumer that relies on the documented meaning of latents (e.g., debugging, exporting raw intermediate state, computing percent-noise) will receive an x0 estimate while believing it received the current noisy state. The fix is latents=latents.squeeze(2), predicted_original=latents_preview.squeeze(2).
To expose this issue, add a unit test that asserts PipelineIntermediateState.latents produced by the Anima loop has variance consistent with the current sigma rather than near-zero (or directly asserts predicted_original is not None once the API is used as intended).
Low: invokeai/app/invocations/anima_denoise.py:709-715 introduces non-trivial math (preview = x_t - sigma * v) with no unit test. tests/app/invocations/test_anima_denoise.py only covers loglinear_timestep_shift; no test exercises _estimate_preview_latents. The function is pure, side-effect-free, and depends only on tensor arithmetic, so this is cheap coverage to add.
To expose any regression in the formula (e.g., a future change accidentally flipping the sign or removing the fp32 promotion for bf16 inputs and triggering precision loss), add a test that calls AnimaDenoiseInvocation()._estimate_preview_latents(latents, sigma, noise_pred) with bf16 inputs and asserts both the dtype roundtrip and the closed-form result against a fp32 reference.
Open Questions
Heun/multi-step drivers: in the driver branch, latents_preview is recomputed on every sub-iteration but only emitted when it.completes_user_step is true. The emitted preview is therefore the x0 estimate from the last sub-iteration (e.g., the second-order corrector for Heun). I confirmed this is mathematically reasonable, but I did not verify against invokeai/backend/anima/ driver semantics that completes_user_step is guaranteed to be true on at least one iteration per emitted step and that the it.sigma_curr at that iteration is consistent with the latents being fed into the transformer. If a driver ever yields a completes_user_step=True iteration where the noise_pred was computed at a different sigma than it.sigma_curr (e.g., a corrector step that uses a midpoint sigma), the preview would systematically misestimate x0. Worth a spot-check of the Heun and any future driver implementations.
First-step preview at high sigma: at the very first user step, sigma_curr ~= 1.0 and latents is pure noise, so preview = latents - 1 * noise_pred is an unconstrained x0 estimate that may look noisy/garbage rather than empty. Not a defect (this is standard RF behavior), but worth confirming the UX is not worse than the previous "raw noisy latents" preview for the first 1-2 steps before celebrating the change as a UX improvement.
We are already not displaying corrected masked areas during inpainting on Anima. User will see what the mask areas are trending towards before correction for early steps, but they stabilize before the preview does anyway.
Yeah, can fix that. The only consumer is the UI preview, and the reason for both inputs is older models with samplers that supplied both.
Disagree that the math is non-trivial. It's one subtraction.
Substep early exits potentially disrupting the preview are not a concern.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Small change to the intermediate latent being passed to the UI callback: Instead of always showing the current in-progress latents during denoise, use the current sigma value to estimate what the latents will look like at the final step. This lets the user see the preview earlier, and also allows them to see when certain features appear/transform in the denoising process.
QA Instructions
Run an anima txt2img and watch the image preview. Works with any sampler.
Previous example at 50%:

New example at 50%:

Merge Plan
Ready to merge
Checklist
What's Newcopy (if doing a release after this PR)