Question about Stage 2 loss implementation

Hi, thank you for releasing this project.

I am a bit confused about the Stage 2 training objective and would appreciate your clarification.

From my understanding of the paper, the Stage 2 / Diffusion-FT loss is described as an image-space reconstruction loss, using a combination of L1 and L2 losses between the predicted image and the target image.

However, in the current implementation, Stage 2 appears to still use the diffusion noise prediction MSE loss:

```python
# src/diffvs/train_stage2_diffusion_ft.py

pred = unet(
    model_input,
    timesteps,
    encoder_hidden_states=marker_context,
    return_dict=False,
)[0]

loss = F.mse_loss(pred.float(), noise.float(), reduction="mean")
```

So the current Stage 2 objective seems to be:

```text
MSE(predicted_noise, true_noise)
```

rather than:

```text
L1(predicted_image, target_image) + L2(predicted_image, target_image)
```

I also noticed that the main difference from Stage 1 is that Stage 2 uses a fixed timestep:

```python
timesteps = torch.full(
    (target_latents.shape[0],),
    int(args.single_step_timestep),
    ...
)
```

while Stage 1 samples random timesteps.

Could you please clarify this question?

Specifically:

1. Is the released Stage 2 implementation intended to use noise prediction MSE instead of the image-space L1 + L2 loss described in the paper?
2. If the paper loss should be used, should the implementation first reconstruct `pred_x0` from the predicted noise, decode it with the VAE, and then compute image-space L1/L2 against the target image?
3. Is this repository implementing a simplified or modified version of the Stage 2 objective?

Thank you!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about Stage 2 loss implementation #4

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Question about Stage 2 loss implementation #4

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions