Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
224 commits
Select commit Hold shift + click to select a range
a808242
Add data preprocess pipeline for WanGame
RandNMR73 Feb 2, 2026
b1abb5c
Update action_labels
JerryZhou54 Feb 2, 2026
f3a7a37
Overfitting running for MC 10
JerryZhou54 Feb 3, 2026
bdaf2e2
Support custom action trajectories for validation
JerryZhou54 Feb 3, 2026
ca16275
no text
mignonjia Feb 6, 2026
c077249
text
mignonjia Feb 6, 2026
7aedf3d
zero init fix
mignonjia Feb 6, 2026
a70f153
Merge branch 'wangame' into wangame-text
mignonjia Feb 6, 2026
4c1d9a9
Merge pull request #2 from mignonjia/wangame-text
mignonjia Feb 6, 2026
8cb516d
Revert "Merge pull request #2 from mignonjia/wangame-text"
mignonjia Feb 6, 2026
35bd0ad
wsad
mignonjia Feb 6, 2026
868ce50
load ckpt using safetensor
mignonjia Feb 8, 2026
8b4dec5
shuffle each epoch
mignonjia Feb 8, 2026
d4a3349
actions
mignonjia Feb 9, 2026
ecc8f56
actions
mignonjia Feb 9, 2026
abc026b
compute correct trainable params
mignonjia Feb 9, 2026
d70da3b
allow multiple data path
mignonjia Feb 9, 2026
fb219f8
update inference and wangame lingtbot
H1yori233 Feb 9, 2026
2b48211
wangame ode init
H1yori233 Feb 9, 2026
b4c5420
registry causal and ode init
H1yori233 Feb 10, 2026
4441867
update script
H1yori233 Feb 10, 2026
97c4fc6
draft code
H1yori233 Feb 10, 2026
f113791
some fix
H1yori233 Feb 10, 2026
5a767aa
update
H1yori233 Feb 10, 2026
0683e09
precommit
H1yori233 Feb 10, 2026
e4b549b
Merge remote-tracking branch 'origin/main' into wangame-distillation
H1yori233 Feb 10, 2026
fbddfba
update
H1yori233 Feb 11, 2026
8f9458d
validation
mignonjia Feb 11, 2026
8f2188d
Merge remote-tracking branch 'upstream/main' into wangame
mignonjia Feb 11, 2026
2e94cea
registry; Lingbot need to validate
mignonjia Feb 11, 2026
7fc6c89
revert back seed and scheduler
mignonjia Feb 11, 2026
6d76832
some fix
H1yori233 Feb 12, 2026
3a84f74
fix causal denoising
H1yori233 Feb 12, 2026
89a48e4
validation 81 frame
mignonjia Feb 12, 2026
64ba901
update
H1yori233 Feb 12, 2026
9a38c3d
val
mignonjia Feb 13, 2026
bc1be4c
use mg causal denoising
H1yori233 Feb 14, 2026
c486e6c
fix cache handling logic
H1yori233 Feb 15, 2026
059fd2f
add visualization
H1yori233 Feb 15, 2026
ffec3b1
try to read and design
alexzms Feb 21, 2026
98a1d53
read fastgen
alexzms Feb 21, 2026
ace0cac
designing
alexzms Feb 21, 2026
b3f9faa
phase 0
alexzms Feb 21, 2026
b936be8
progressing phase 0
alexzms Feb 21, 2026
a8bada4
phase 0 should be done
alexzms Feb 21, 2026
c4c2d89
phase0 warning
alexzms Feb 21, 2026
5720ae9
validation in method
alexzms Feb 21, 2026
c8da42e
scripts for testing phase0
alexzms Feb 21, 2026
0c1bd0d
temp launch
alexzms Feb 21, 2026
97fa792
phase 1 design
alexzms Feb 21, 2026
d6ecdad
progressing phase 1
alexzms Feb 21, 2026
7b2d8e5
phase 1 init impl
alexzms Feb 21, 2026
6134721
general distill endpoint
alexzms Feb 21, 2026
4a88606
distillation
alexzms Feb 21, 2026
ce22aea
temporary run script
alexzms Feb 21, 2026
d20753b
random generator fix
alexzms Feb 21, 2026
e36507b
Phase 1 works very well on training.
alexzms Feb 22, 2026
bd24192
dmd2 adapter comments
alexzms Feb 22, 2026
b9590f8
removing phase 0 dependency
alexzms Feb 22, 2026
8461d68
design phase 2
alexzms Feb 22, 2026
c9681ce
designing phase 2: config
alexzms Feb 22, 2026
889c1c5
designing phase 2: config 2
alexzms Feb 22, 2026
7431e95
progressing phase 2
alexzms Feb 22, 2026
f8029ad
progressing phase 2. 2
alexzms Feb 22, 2026
cef57ef
phase 2 init impl
alexzms Feb 22, 2026
7f14865
phase 2 config. training code
alexzms Feb 22, 2026
99acff3
remove all legacy dependency
alexzms Feb 22, 2026
0a3be30
fix gpu num
alexzms Feb 22, 2026
225be11
ckpt manager for phase 2
alexzms Feb 22, 2026
b58d5cd
config design
alexzms Feb 22, 2026
7d20269
designing phase 3
alexzms Feb 22, 2026
02e948b
designing phase 2.9: decoupling adapter
alexzms Feb 23, 2026
c6707c8
restart thread
H1yori233 Feb 23, 2026
11709a4
designing phase 2.9: explain why families registry
alexzms Feb 23, 2026
1e1e11f
phase 2.9 init impl
alexzms Feb 23, 2026
70157b4
wan adapter decouple
alexzms Feb 23, 2026
f12732c
removing dmd in wan adapters
alexzms Feb 23, 2026
8853110
phase2.9: adapter ang families decouple from dmd
alexzms Feb 23, 2026
074f559
doc for every file
alexzms Feb 23, 2026
e8a8371
update wangame sf
H1yori233 Feb 23, 2026
92a0db7
Merge branch 'wangame-distillation' of github-second:mignonjia/FastVi…
H1yori233 Feb 23, 2026
baac257
validation decouple from dmd and role
alexzms Feb 23, 2026
88c946d
some fix
H1yori233 Feb 23, 2026
cb09285
freeze action slurm: Doom from MC
mignonjia Feb 23, 2026
743b04d
some comment to the slurm
mignonjia Feb 23, 2026
c3c75c2
fix circular import and designing phase 2.9: validation config
alexzms Feb 24, 2026
601da9d
validator still use WanDMDPipeline. Future decoupling will be done in…
alexzms Feb 24, 2026
a8e728b
phase 2.9 config
alexzms Feb 24, 2026
8ed3869
sheidewenti?
alexzms Feb 24, 2026
7220ebf
phase 3 design: decouple simulate_generator_forward
alexzms Feb 24, 2026
5ee043c
phase 3.1 impl
alexzms Feb 24, 2026
b985942
phase 3.2 impl
alexzms Feb 24, 2026
dd80a97
fix validator not using sampler.
alexzms Feb 24, 2026
167ddb9
fix timestep
alexzms Feb 24, 2026
72a91e8
deisigning phase 3.3 finetuning
alexzms Feb 24, 2026
97534d8
phase 3.3 init impl
alexzms Feb 24, 2026
2a68748
upload wandb file
alexzms Feb 24, 2026
4d2acfc
vsa finetune
alexzms Feb 25, 2026
be92a57
add wangame dmd distillation
H1yori233 Feb 25, 2026
ef3d699
discussing refactor
alexzms Feb 25, 2026
5f5e74f
changing config.md
alexzms Feb 25, 2026
a701455
better config
alexzms Feb 25, 2026
63c5985
merge yaml_config.py into utils/config.py
alexzms Feb 25, 2026
3d8b482
designing phase 3.4
alexzms Feb 25, 2026
8ba4271
phase 3.4
alexzms Feb 25, 2026
c67eb0d
family->model
alexzms Feb 25, 2026
9febcd7
family->model
alexzms Feb 25, 2026
7589720
a bug of encoder_hidden_states_img
mignonjia Feb 25, 2026
dc538dd
Merge remote-tracking branch 'origin/wangame' into wangame-distillation
H1yori233 Feb 25, 2026
638d955
rolemanager, dispatch.
alexzms Feb 25, 2026
a866c4a
log metrics
alexzms Feb 26, 2026
64b9fea
use flowshift=3 validator
alexzms Feb 26, 2026
7399670
tracker, utils, loader
alexzms Feb 26, 2026
6d25bea
utils, config.
alexzms Feb 26, 2026
7072dd4
rfc cn
alexzms Feb 26, 2026
cfa4318
rfc en
alexzms Feb 26, 2026
666ec36
eval
mignonjia Feb 26, 2026
e3cd0a3
select best ckpt
mignonjia Feb 26, 2026
52e629b
Merge branch 'hao-ai-lab:main' into distill1
alexzms Feb 26, 2026
2fb4655
add wangame diffusion forcing
H1yori233 Feb 26, 2026
ccc5e43
Merge remote-tracking branch 'origin/wangame' into wangame-distillation
H1yori233 Feb 26, 2026
dbe17e2
no npy
alexzms Feb 26, 2026
ac86bb2
Merge remote-tracking branch 'mignonjia/wangame' into distill1
alexzms Feb 26, 2026
ceb4de0
designing wangame import
alexzms Feb 26, 2026
de3b8b3
designing wangame: cfg
alexzms Feb 26, 2026
db37b64
wangame support distillation
alexzms Feb 26, 2026
d2387d3
dmd method cfg_uncond
alexzms Feb 26, 2026
e556321
wangame i2v pipeline support ode/sde
alexzms Feb 26, 2026
ed4f636
sde denoising stage
alexzms Feb 27, 2026
e994272
action cfg vs no cfg
alexzms Feb 27, 2026
f036db3
designing causal wangame and dfsft
alexzms Feb 27, 2026
91f260c
designing dfsft
alexzms Feb 28, 2026
c7334a2
validation config refine
alexzms Feb 28, 2026
a27d4a9
better validation config
alexzms Feb 28, 2026
82a24ee
dfdft and causal wan init impl
alexzms Feb 28, 2026
e2747ef
support other scheduler
alexzms Feb 28, 2026
27fd58e
not strict loading
alexzms Feb 28, 2026
7dff17c
Merge remote-tracking branch 'mignonjia/wangame-distillation' into di…
alexzms Feb 28, 2026
a225e15
use CausalWanGameActionTransformer3DModel on wangame causal
alexzms Feb 28, 2026
339a551
validator rollout mode
alexzms Feb 28, 2026
257495b
validator streaming causal rollout
alexzms Feb 28, 2026
679a65d
wangame support validator num_frames
alexzms Feb 28, 2026
8253a9a
fix scheduler out of bound
alexzms Feb 28, 2026
573699f
adapter -> models. config decleartion.
alexzms Feb 28, 2026
9144a43
32 gpu training slurm
alexzms Mar 1, 2026
7e5ed86
config
alexzms Mar 2, 2026
e85b786
designing new model class
alexzms Mar 2, 2026
e63e214
deprecate adapter design
alexzms Mar 2, 2026
7c1442c
reorder and structure inherit hierarchy
alexzms Mar 2, 2026
628765b
support init from ckpt
alexzms Mar 2, 2026
9d08b03
no repetitive model protocal
alexzms Mar 2, 2026
9050496
reduce concept: distillruntime
alexzms Mar 3, 2026
ba9686b
utils/optimizer, utils/validation
alexzms Mar 3, 2026
f4eb1e6
causal dmd config
alexzms Mar 3, 2026
07d8a57
4n8g finetuning
alexzms Mar 3, 2026
ced0ae5
wangame causal dmd 4n8g
alexzms Mar 3, 2026
c461e6b
sf init impl
alexzms Mar 3, 2026
2719650
designing causal rollout stuff
alexzms Mar 3, 2026
0b924e7
causal for self forcing
alexzms Mar 3, 2026
d63f673
self forcing only allows student to be causalmodel
alexzms Mar 3, 2026
07a673c
self forcing config
alexzms Mar 3, 2026
6b3ff77
better yaml tracker
alexzms Mar 3, 2026
2d82e59
safe checkpointing for wangame causal for self forcing
alexzms Mar 3, 2026
4e77535
common part of wangame
alexzms Mar 3, 2026
15c644a
remove manual branch in selfforcing
alexzms Mar 3, 2026
24bc5dc
fix gradient ckpt missing keys
alexzms Mar 3, 2026
acae34e
causal refactor 2
alexzms Mar 3, 2026
fcfae12
remove redundent varient
alexzms Mar 3, 2026
6d57a0a
designing refactor
alexzms Mar 4, 2026
5179f84
designing refactor 2
alexzms Mar 4, 2026
a82208a
designing refactor 3
alexzms Mar 4, 2026
bf43339
designing refactor 4
alexzms Mar 4, 2026
0e6d109
designing refactor 5
alexzms Mar 4, 2026
264ac26
deisgning refactor 6
alexzms Mar 4, 2026
3e1063f
refactor 7
alexzms Mar 4, 2026
163605c
refactor 8
alexzms Mar 4, 2026
4323130
refactor init impl
alexzms Mar 4, 2026
ac5a2f7
remove moe support for now
alexzms Mar 4, 2026
4724893
fastgen structure
alexzms Mar 4, 2026
c9de049
better yaml file structure
alexzms Mar 4, 2026
a114c14
method specific config move to right place
alexzms Mar 4, 2026
e04ced0
run scripts
alexzms Mar 4, 2026
dbca2a5
120 col
alexzms Mar 4, 2026
cb7247d
bugfix
alexzms Mar 4, 2026
9d06875
remove fastvideo.training_args dependency
alexzms Mar 4, 2026
42fa1ba
remove redundant loader_args
alexzms Mar 4, 2026
b6af8eb
distill->train
alexzms Mar 4, 2026
d72ea2b
distill->train
alexzms Mar 4, 2026
d6a101f
simplify. only nested config is allowed
alexzms Mar 4, 2026
37b001d
trainconfig should not be none during init
alexzms Mar 4, 2026
dd7fbb5
validation config will be included in traning config
alexzms Mar 4, 2026
b753c17
self.student.validator is guaranteed exist
alexzms Mar 4, 2026
eacb5f6
~/.claude/plans/wise-mixing-pie.md
alexzms Mar 4, 2026
08ed16e
simplifying code
alexzms Mar 4, 2026
7fe51fc
remove getting trainable using getattr
alexzms Mar 4, 2026
c8f6d08
timestep fix for dfsft
alexzms Mar 5, 2026
f94a725
finetune vsa and wangame yaml
alexzms Mar 5, 2026
f3f3629
validator vsa sparsity
alexzms Mar 5, 2026
0678846
validation callback
alexzms Mar 5, 2026
3b45f23
grad clipping callback
alexzms Mar 5, 2026
6c30e25
ema callback implementation
alexzms Mar 5, 2026
6a0032a
ema and corresponding validation
alexzms Mar 5, 2026
793d184
remove legacy bundle design
alexzms Mar 6, 2026
56a8ee8
better checkpointing
alexzms Mar 6, 2026
4bbb27e
entry point, dcp to diffuers conversion.
alexzms Mar 6, 2026
093e130
validation allow no video.
alexzms Mar 6, 2026
175dbb9
fix gradient ckpting and dit precision not identified problem
alexzms Mar 6, 2026
ffe7ed8
minor fixing
alexzms Mar 6, 2026
5291ffd
dispatch->builder
alexzms Mar 6, 2026
b2b7de3
grad norm must be set in callback system
alexzms Mar 6, 2026
4e3bbec
minor
alexzms Mar 6, 2026
8d83d47
remove doc
alexzms Mar 6, 2026
5fe161e
example yaml
alexzms Mar 6, 2026
a4b9691
solver -> target. remove legact valiadition key
alexzms Mar 6, 2026
7c2648a
move device from build pre to init
alexzms Mar 6, 2026
1d86df3
minor config
alexzms Mar 6, 2026
c9bfc65
fix dmd2 and selfforcing cfg
alexzms Mar 6, 2026
9b69bbe
rfc and dmd minor
alexzms Mar 6, 2026
68c08d1
remove npys
alexzms Mar 6, 2026
449c009
remove dev doc, remove other phases doc
alexzms Mar 6, 2026
850dfd8
resolve
alexzms Mar 6, 2026
3893939
delete wangame files
mignonjia Mar 6, 2026
a72bd9c
revise config
mignonjia Mar 7, 2026
2f29a79
Delete visualize_trajectory.py
mignonjia Mar 7, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,9 @@ env
**.pyc
**.txt
*.log
*.npy
weights/
slurm_outputs/

# SSIM test outputs
fastvideo/tests/ssim/generated_videos/
Expand Down Expand Up @@ -82,4 +84,4 @@ docs/distillation/examples/
!assets/videos/**/*.mp4

dmd_t2v_output/
preprocess_output_text/
preprocess_output_text/
91 changes: 91 additions & 0 deletions examples/train/distill_wan2.1_t2v_1.3B_dmd2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# DMD2 distillation: Wan 2.1 T2V 1.3B (teacher 50-step -> student 8-step).
#
# - Teacher: frozen pretrained Wan 2.1 T2V 1.3B
# - Student: trainable, initialized from the same pretrained weights
# - Critic: trainable, initialized from the same pretrained weights
# - Validation: 8-step SDE sampling

models:
student:
_target_: fastvideo.train.models.wan.WanModel
init_from: Wan-AI/Wan2.1-T2V-1.3B-Diffusers
trainable: true
teacher:
_target_: fastvideo.train.models.wan.WanModel
init_from: Wan-AI/Wan2.1-T2V-1.3B-Diffusers
trainable: false
disable_custom_init_weights: true
critic:
_target_: fastvideo.train.models.wan.WanModel
init_from: Wan-AI/Wan2.1-T2V-1.3B-Diffusers
trainable: true
disable_custom_init_weights: true

method:
_target_: fastvideo.train.methods.distribution_matching.dmd2.DMD2Method
rollout_mode: simulate
generator_update_interval: 5
real_score_guidance_scale: 4.5
dmd_denoising_steps: [1000, 850, 700, 550, 350, 275, 200, 125]

# Critic optimizer (required — no fallback to training.optimizer)
fake_score_learning_rate: 8.0e-6
fake_score_betas: [0.0, 0.999]
fake_score_lr_scheduler: constant

training:
distributed:
num_gpus: 8
sp_size: 1
tp_size: 1
hsdp_replicate_dim: 1
hsdp_shard_dim: 8

data:
data_path: data/Wan-Syn_77x448x832_600k
dataloader_num_workers: 4
train_batch_size: 1
training_cfg_rate: 0.0
seed: 1000
num_latent_t: 20
num_height: 448
num_width: 832
num_frames: 77

optimizer:
learning_rate: 2.0e-6
betas: [0.0, 0.999]
weight_decay: 0.01
lr_scheduler: constant
lr_warmup_steps: 0

loop:
max_train_steps: 4000
gradient_accumulation_steps: 1

checkpoint:
output_dir: outputs/wan2.1_dmd2_8steps
training_state_checkpointing_steps: 1000
checkpoints_total_limit: 3

tracker:
project_name: distillation_wan
run_name: wan2.1_dmd2_8steps_cfg4.5

model:
enable_gradient_checkpointing_type: full

callbacks:
grad_clip:
max_grad_norm: 1.0
validation:
pipeline_target: fastvideo.pipelines.basic.wan.wan_pipeline.WanPipeline
dataset_file: examples/training/finetune/Wan2.1-VSA/Wan-Syn-Data/validation_4.json
every_steps: 50
sampling_steps: [8]
sampler_kind: sde
sampling_timesteps: [1000, 850, 700, 550, 350, 275, 200, 125]
guidance_scale: 6.0

pipeline:
flow_shift: 8
208 changes: 208 additions & 0 deletions examples/train/example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
# ==============================================================================
# Full configuration reference for fastvideo.train
#
# Legend:
# [TYPED] — parsed into a typed dataclass; fields are validated with
# defaults. Unknown keys are silently ignored.
# [FREE] — free-form dict passed as-is to the target class / method.
# Keys depend on the _target_ class constructor / method_config.
# [RESOLVED] — parsed by PipelineConfig.from_kwargs(); auto-populated from
# the model's config files. Only scalar overrides are useful.
# ==============================================================================

# ------------------------------------------------------------------------------
# models: [FREE]
#
# Each role is instantiated via _target_(*, training_config=..., **kwargs).
# Keys here are constructor kwargs of the _target_ class (e.g. WanModel).
# You can define any role name (student, teacher, critic, etc.).
# ------------------------------------------------------------------------------
models:
student:
_target_: fastvideo.train.models.wan.WanModel # required
init_from: Wan-AI/Wan2.1-T2V-1.3B-Diffusers # required: HF repo or local path
trainable: true # default: true
disable_custom_init_weights: false # default: false
flow_shift: 3.0 # default: 3.0
enable_gradient_checkpointing_type: null # default: null (falls back to training.model)

teacher:
_target_: fastvideo.train.models.wan.WanModel
init_from: Wan-AI/Wan2.1-T2V-1.3B-Diffusers
trainable: false
disable_custom_init_weights: true

critic:
_target_: fastvideo.train.models.wan.WanModel
init_from: Wan-AI/Wan2.1-T2V-1.3B-Diffusers
trainable: true
disable_custom_init_weights: true

# ------------------------------------------------------------------------------
# method: [FREE]
#
# Instantiated via _target_(*, cfg=RunConfig, role_models=...).
# All keys besides _target_ are available in self.method_config (a plain dict).
# Keys depend entirely on the method class.
# ------------------------------------------------------------------------------
method:
_target_: fastvideo.train.methods.distribution_matching.dmd2.DMD2Method # required

# --- DMD2-specific keys (read from self.method_config) ---
rollout_mode: simulate # required: "simulate" or "data_latent"
generator_update_interval: 5 # default: 1
dmd_denoising_steps: [1000, 750, 500, 250] # SDE timestep schedule

# Critic optimizer (all required — no fallback)
fake_score_learning_rate: 8.0e-6
fake_score_betas: [0.0, 0.999]
fake_score_lr_scheduler: constant

# CFG conditioning policy (optional)
# cfg_uncond:
# on_missing: error # "error" or "ignore"
# text: keep # "keep", "zero", "drop", "negative_prompt"
# image: keep # "keep", "zero", "drop"
# action: keep # "keep", "zero", "drop"

# --- FineTuneMethod keys (if using finetune instead) ---
# _target_: fastvideo.train.methods.fine_tuning.finetune.FineTuneMethod
# attn_kind: vsa # "dense" or "vsa"
# use_ema: false

# ------------------------------------------------------------------------------
# training: [TYPED] -> TrainingConfig
#
# Every field below has a typed default. Unknown keys are ignored.
# ------------------------------------------------------------------------------
training:

# --- training.distributed [TYPED] -> DistributedConfig ---
distributed:
num_gpus: 8 # default: 1
tp_size: 1 # default: 1
sp_size: 1 # default: 1 (defaults to num_gpus in loader)
hsdp_replicate_dim: 1 # default: 1
hsdp_shard_dim: 8 # default: -1 (defaults to num_gpus in loader)
pin_cpu_memory: false # default: false

# --- training.data [TYPED] -> DataConfig ---
data:
data_path: data/my_dataset # default: ""
train_batch_size: 1 # default: 1
dataloader_num_workers: 4 # default: 0
training_cfg_rate: 0.1 # default: 0.0
seed: 1000 # default: 0
num_height: 448 # default: 0
num_width: 832 # default: 0
num_latent_t: 20 # default: 0
num_frames: 77 # default: 0

# --- training.optimizer [TYPED] -> OptimizerConfig ---
# Note: only for the student optimizer. Critic optimizer is in method config.
optimizer:
learning_rate: 2.0e-6 # default: 0.0
betas: [0.9, 0.999] # default: [0.9, 0.999]
weight_decay: 0.01 # default: 0.0
lr_scheduler: constant # default: "constant"
lr_warmup_steps: 0 # default: 0
lr_num_cycles: 0 # default: 0
lr_power: 0.0 # default: 0.0
min_lr_ratio: 0.5 # default: 0.5

# --- training.loop [TYPED] -> TrainingLoopConfig ---
loop:
max_train_steps: 10000 # default: 0
gradient_accumulation_steps: 1 # default: 1

# --- training.checkpoint [TYPED] -> CheckpointConfig ---
checkpoint:
output_dir: outputs/my_run # default: ""
resume_from_checkpoint: "" # default: "" (or use --resume-from-checkpoint CLI)
training_state_checkpointing_steps: 1000 # default: 0 (disabled)
checkpoints_total_limit: 3 # default: 0 (keep all)

# --- training.tracker [TYPED] -> TrackerConfig ---
tracker:
trackers: [] # default: [] (auto-adds "wandb" if project_name is set)
project_name: my_project # default: "fastvideo"
run_name: my_run # default: ""

# --- training.vsa [TYPED] -> VSAConfig ---
vsa:
sparsity: 0.0 # default: 0.0 (0.0 = disabled)
decay_rate: 0.0 # default: 0.0
decay_interval_steps: 0 # default: 0

# --- training.model [TYPED] -> ModelTrainingConfig ---
model:
weighting_scheme: uniform # default: "uniform"
logit_mean: 0.0 # default: 0.0
logit_std: 1.0 # default: 1.0
mode_scale: 1.0 # default: 1.0
precondition_outputs: false # default: false
moba_config: {} # default: {}
enable_gradient_checkpointing_type: full # default: null ("full" or null)

# --- training top-level [TYPED] ---
dit_precision: fp32 # default: "fp32" (master weight precision)
# model_path: ... # default: "" (auto-derived from models.student.init_from)

# ------------------------------------------------------------------------------
# callbacks: [FREE]
#
# Each callback is instantiated via _target_(*, **kwargs).
# The callback name (e.g. "grad_clip") is arbitrary — only _target_ matters.
# training_config is injected automatically (not from YAML).
# ------------------------------------------------------------------------------
callbacks:

# --- GradNormClipCallback ---
grad_clip:
_target_: fastvideo.train.callbacks.grad_clip.GradNormClipCallback # optional if using default registry
max_grad_norm: 1.0 # default: 0.0 (0.0 = disabled)
log_grad_norms: false # default: false

# --- EMACallback ---
# ema:
# _target_: fastvideo.train.callbacks.ema.EMACallback
# type: constant # default: "constant" ("constant", "power", "halflife")
# beta: 0.9999 # default: 0.9999 (for constant type)
# gamma: 16.97 # default: 16.97 (for power type)
# ema_halflife_kimg: 500.0 # default: 500.0 (for halflife type)
# ema_rampup_ratio: 0.05 # default: 0.05 (for halflife type)
# start_iter: 0 # default: 0
# batch_size: 1 # default: 1

# --- ValidationCallback ---
validation:
_target_: fastvideo.train.callbacks.validation.ValidationCallback # optional if using default registry
pipeline_target: fastvideo.pipelines.basic.wan.wan_pipeline.WanPipeline # required
dataset_file: path/to/validation.json # required
every_steps: 100 # default: 100
sampling_steps: [4] # default: [40]
sampler_kind: sde # default: "ode" (use "sde" for few-step distilled models)
scheduler_target: null # default: null (_target_ for scheduler class, e.g.
# fastvideo.models.schedulers.scheduling_flow_match_euler_discrete.FlowMatchEulerDiscreteScheduler
# fastvideo.models.schedulers.scheduling_flow_unipc_multistep.FlowUniPCMultistepScheduler)
guidance_scale: 5.0 # default: null (uses model default)
num_frames: null # default: null (derived from training.data)
output_dir: null # default: null (falls back to training.checkpoint.output_dir)
sampling_timesteps: null # default: null (explicit timestep list for SDE)
rollout_mode: parallel # default: "parallel" ("parallel" or "streaming")

# ------------------------------------------------------------------------------
# pipeline: [RESOLVED] -> PipelineConfig
#
# Parsed by PipelineConfig.from_kwargs(). Most fields are auto-populated from
# the model's config files (vae_config, dit_config, text_encoder_configs, etc.).
# Only scalar overrides are typically needed here.
# ------------------------------------------------------------------------------
pipeline:
flow_shift: 3 # default: null (model-specific)
# flow_shift_sr: null # default: null (super-resolution shift)
# embedded_cfg_scale: 6.0 # default: 6.0
# is_causal: false # default: false
# vae_tiling: true # default: true
# vae_sp: true # default: true
# disable_autocast: false # default: false
Loading